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Abstract 

A  new  class  of  algorithms  is  introduced  and  analyzed  for  bound  and  linearly  con¬ 
strained  optimization  problems  with  stochastic  objective  functions  and  a  mixture  of  design 
variable  types.  The  generalized  pattern  search  (GPS)  class  of  algorithms  is  extended  to  a 
new  problem  setting  in  which  objective  function  evaluations  require  sampling  from  a  model 
of  a  stochastic  system.  The  approach  combines  GPS  with  ranking  and  selection  (R&S) 
statistical  procedures  to  select  new  iterates.  The  derivative-free  algorithms  require  only 
black-box  simulation  responses  and  are  applicable  over  domains  with  mixed  variables  (con¬ 
tinuous,  discrete  numeric,  and  discrete  categorical)  to  include  bound  and  linear  constraints 
on  the  continuous  variables.  A  convergence  analysis  for  the  general  class  of  algorithms 
establishes  almost  sure  convergence  of  an  iteration  subsequence  to  stationary  points  appro¬ 
priately  defined  in  the  mixed-variable  domain.  Additionally,  specific  algorithm  instances 
are  implemented  that  provide  computational  enhancements  to  the  basic  algorithm.  Im¬ 
plementation  alternatives  include  the  use  of  modern  R&S  procedures  designed  to  provide 
efficient  sampling  strategies  and  the  use  of  surrogate  functions  that  augment  the  search  by 
approximating  the  unknown  objective  function  with  nonparametric  response  surfaces.  In 
a  computational  evaluation,  six  variants  of  the  algorithm  are  tested  along  with  four  com¬ 
peting  methods  on  26  standardized  test  problems.  The  numerical  results  validate  the  use 
of  advanced  implementations  as  a  means  to  improve  algorithm  performance. 
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Pattern  Search  Ranking  and  Selection  Algorithms  for 
Mixed- Variable  Optimization  of  Stochastic  Systems 

Chapter  1  -  Introduction 

1 . 1  Problem  Setting 

Consider  the  optimization  of  a  stochastic  system  in  which  the  objective  is  to  find  a  set 
of  controllable  system  parameters  that  minimize  some  performance  measure  of  the  system. 
This  situation  is  representative  of  many  real-world  optimization  problems  in  which  random 
noise  is  present  in  the  evaluation  of  the  objective  function.  In  many  cases,  the  system  is  of 
sufficient  complexity  so  that  the  objective  function,  representing  the  performance  measure 
of  interest,  cannot  be  formulated  analytically  and  must  be  evaluated  via  a  representative 
model  of  the  system.  In  particular,  the  use  of  simulation  is  emphasized  as  a  means  of 
characterizing  and  analyzing  system  performance.  The  term  simulation  is  used  in  a  generic 
sense  to  indicate  a  numerical  procedure  that  takes  as  input  a  set  of  controllable  system 
parameters  (design  variables)  and  generates  as  output  a  response  for  the  measure  of  interest. 
It  is  assumed  that  the  variance  of  this  measure  can  be  reduced  at  the  expense  of  additional 
computational  effort,  e.g.,  repeated  sampling  from  the  simulation. 

Applications  involve  the  optimization  of  system  designs  where  the  systems  under  analy¬ 
sis  are  represented  as  simulation  models,  such  as  those  used  to  model  manufacturing  sys¬ 
tems,  production-inventory  situations,  communication  or  other  infrastructure  networks,  lo¬ 
gistics  support  systems,  or  airline  operations.  In  these  situations,  a  search  methodology  is 
used  to  drive  the  search  for  the  combination  of  values  of  the  design  variables  that  optimize 
a  system  measure  of  performance.  A  model  of  such  a  stochastic  optimization  methodology 
via  simulation  is  depicted  in  Figure  1.1. 


Figure  1.1.  Model  for  Stochastic  Optimization  via  Simulation 


The  random  performance  measure  may  be  modeled  as  an  unknown  response  funetion 
F{x,oj)  which  depends  upon  an  n-dimensional  vector  of  controllable  design  variables  x  G 
M”,  and  the  vector  ca,  which  represents  random  effects  inherent  to  the  system.  The  objeetive 
funetion  f  of  the  optimization  problem  is  the  expected  performance  of  the  system,  given 

by 

f{x)  =  Ep[F{x,uj)]=  [  F{x,uj)P{du),  (1.1) 

Jn 

where  w  G  can  be  considered  an  element  of  an  underlying  probability  space  (fl,  F,  P) 
with  sample  space  fl,  sigma-field  F,  and  probability  measure  P.  It  is  assumed  that  the 
probability  distribution  that  defines  the  response  F{x,uj)  is  unknown  but  can  be  sampled. 

Even  for  noise- free  system  responses  obtained  via  simulation,  finding  optimal  solutions 
using  traditional  optimization  approaches  can  be  difficult  since  the  structure  of  /  is  un¬ 
known,  analytical  derivatives  are  unavailable,  and  numerical  evaluation  of  /  may  involve 
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expensive  simulation  runs.  The  presence  of  random  variation  further  complicates  matters 
because  /  cannot  be  evaluated  exactly  and  derivative  approximating  techniques,  such  as 
finite  differencing,  become  problematic.  Estimating  /  requires  the  aggregation  of  repeated 
samples  of  the  response  E,  making  it  difficult  to  determine  conclusively  if  one  design  is 
better  than  another  and  further  hindering  search  methods  that  explicitly  rely  on  directions 
of  improvement.  Multiple  samples  at  each  design  point  implies  the  necessity  of  extra  com¬ 
putational  effort  to  obtain  sufficient  accuracy,  thereby  reducing  the  number  of  designs  that 
can  be  visited  given  a  fixed  computational  budget. 

Additional  complications  arise  when  elements  of  the  design  vector  are  allowed  to  be 
non-continuous,  either  discrete-numeric  {e.g.  integer- valued)  or  categorical.  Categorical 
variables  are  those  that  can  only  take  on  values  from  a  predefined  list  that  have  no  ordinal 
relationship  to  one  another.  These  restrictions  are  common  for  realistic  stochastic  systems. 
As  examples,  a  stochastic  communication  network  containing  a  buffer  queue  at  each  router 
may  have  an  integer-valued  design  variable  for  the  number  of  routers  and  a  categorical 
design  variable  for  queue  discipline  {e.g.  first-in-first-out  (FIFO),  last-in-first-out  (LIFO) 
or  priority)  at  each  router;  an  engineering  design  problem  may  have  a  categorical  design 
variable  representing  material  types;  a  military  scenario  or  homeland  security  option  may 
have  categories  of  operational  risk.  The  class  of  optimization  problems  that  includes  con¬ 
tinuous,  discrete-numeric  and  categorical  variables  is  known  as  mixed  variable  programming 
(MVP)  problems.  In  this  research,  discrete-numeric  and  categorical  variables  are  grouped 
into  a  discrete  variable  class  by  noting  that  categorical  variables  can  be  mapped  to  discrete 
numerical  values.  For  example,  integer  values  are  assigned  to  the  queue  discipline  categor¬ 
ical  variable  {e.g.  1  =  FIFO,  2  =  LIFO,  and  3  =  priority)  even  though  the  values  do  not 
conform  to  any  inherent  ordering  that  the  numerical  value  suggests. 
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This  research  considers  the  optimization  of  stochastic  systems  with  mixed  variables, 
for  which  the  continuous  variable  values  are  restricted  by  bound  and  linear  constraints. 
The  target  problem  class  is  defined  as, 

min/(a;),  (1.2) 

where  /  :  0  — >  M  is  a  function  of  unknown  analytical  form  and  x  is  the  vector  of  design 
variables  from  the  mixed  variable  domain  0.  This  domain  is  partitioned  into  continuous 
and  discrete  domains  0*^  and  0^^,  respectively,  where  some  or  all  of  the  discrete  variables 
may  be  categorical.  Each  vector  x  G  0  is  denoted  as  x  =  {x^,  x‘^)  where  x'^  are  the 
continuous  variables  of  dimension  and  x'^  are  the  discrete  variables  of  dimension 
The  domain  of  the  continuous  variables  is  restricted  by  bound  and  linear  constraints  0^^  = 
{x'^  G  :  I  <  Ax'^  <  u},  where  A  G  /,  u  G  (M  U  {Too})™'  ,  I  <  u,  and  >  n^. 

The  domain  of  the  discrete  variables  C  Z”'*  is  represented  as  a  subset  of  the  integers  by 
mapping  each  discrete  variable  value  to  a  distinct  integer.  Furthermore,  due  to  inherent 
variation  in  the  stochastic  system,  the  function  /  cannot  be  evaluated  exactly  but  must 
be  estimated  via  observations  of  F  obtained  from  a  representative  model  of  the  stochastic 
system.  Iterative  search  methods  are  necessarily  affected  by  system  noise  such  that  the 
sequence  of  iterates  may  be  considered  random  vectors.  Hence,  the  conventional  notation 
Xfc  to  denote  a  random  quantity  for  the  design  at  iteration  k  is  used  to  distinguish  it  from 
the  notation  x^  used  to  denote  a  realization  of  X^. 

Relaxation  techniques  commonly  used  for  mixed-integer  problems,  such  as  branch-and- 
bound,  are  not  applicable  to  the  mixed-variable  case  because  the  objective  and  response 
functions  are  defined  only  at  the  discrete  settings  of  the  categorical  variables;  therefore, 
relaxing  the  “discreteness”  of  these  variables  is  not  possible.  Small  numbers  of  categori¬ 
cal  variables  can  sometimes  be  treated  by  exhaustively  enumerating  their  possible  values. 
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but  this  approach  quickly  becomes  computationally  prohibitive.  In  this  case,  a  common 
approach  is  to  conduct  a  parametric  study,  in  which  expert  knowledge  of  the  underlying 
problem  is  applied  to  simply  select  a  range  of  appropriate  values,  evaluate  the  objective, 
and  select  the  best  alternative.  Of  course,  this  is  a  sub-optimal  approach.  Thus,  it  is  de¬ 
sirable  to  have  an  optimization  method  that  can  treat  MVP  problems  rigorously. 

1.2  Purpose  of  the  Research 

In  designing  appropriate  solution  algorithms  for  stochastic  optimization  problems,  the 

following  characteristics  are  considered  to  be  important. 

1.  Provably  convergent.  Convergent  algorithms  are  desirable  to  guarantee  that  a  search 
procedure  asymptotically  approaches  at  least  a  local  optimal  solution  when  starting 
from  an  arbitrary  point.  With  random  error  present  in  objective  function  evaluation, 
proving  convergence  requires  additional  assumptions  and  is  typically  established  in 
terms  of  probability  {e.g.  almost  sure  convergence). 

2.  General  purpose.  To  ensure  applicability  to  the  widest  possible  range  of  problems, 
the  following  conditions  of  an  algorithm  are  desired. 

a.  It  is  valid  over  all  combinations  of  variable  types  (he.,  MVP  problems). 

b.  It  requires  neither  knowledge  of  the  underlying  simulation  model  structure  nor 
modification  to  its  code.  Thus,  it  treats  the  model  as  a  black-box  function 
evaluator,  obtaining  an  output  based  on  a  set  of  controllable  inputs. 

3.  Comparatively  efficient.  To  make  the  algorithm  useful  and  viable  for  practitioners, 
the  algorithm  should  perform  well  for  some  subclass  of  the  target  problem  class  (1.2) 
in  comparison  to  competing  methods  in  terms  of  some  performance  measure  {e.g.,  the 
number  of  response  samples  or  computer  processing  time  required  to  achieve  a  specified 
improvement  in  objective  function  value). 

Various  sampling-based  search  strategies  have  been  applied  to  stochastic  optimization 
problems  similar  to  (1.2).  All  of  the  methods  discussed  in  Chapter  2  are  deficient  with 
respect  to  at  least  one  of  the  aforementioned  convergence  and/or  general-purpose  properties. 
The  focus  of  this  research  is  sharpened  by  the  following  statement  of  the  research  problem. 
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1.2.1  Problem  Statement 


There  exists  no  provably  convergent  class  of  methods  for  solving  mixed-variable  sto¬ 
chastic  optimization  problems.  Such  methods  should  require  neither  knowledge  of  nor  mod¬ 
ification  to  the  underlying  stochastic  model.  Algorithmic  implementations  of  these  methods 
should  account  for  the  practical  need  of  computational  efficiency. 

1.2.2  Researeh  Objeetives 

The  purpose  of  this  research  is  to  rigorously  treat  problems  of  the  type  (1.2)  in  their 
most  general  form  (i.e.,  mixed  variables  and  black-box  simulation)  while  addressing  the 
need  for  computationally  efficient  implementations.  The  methodology  extends  a  class  of 
derivative-free  algorithms,  known  as  pattern  search,  that  trace  their  history  to  early  at¬ 
tempts  to  optimize  systems  involving  random  error.  Modern  pattern  search  algorithms 
are  a  direct  descendent  of  Box’s  [33]  original  proposal  to  replace  regression  analysis  of  ex¬ 
perimental  data  with  direct  inspection  of  the  data  to  improve  industrial  efficiency  [144]. 
Although  pattern  search  methods  have  enjoyed  popularity  for  deterministic  optimization 
since  the  1960’s,  only  within  the  last  decade  has  a  generalized  algorithm  class  been  shown 
to  be  convergent  [143].  Even  more  recently,  the  class  of  generalized  pattern  search  (GPS) 
algorithms  has  been  extended  to  problems  with  mixed  variables  [13]  without  sacrifice  to  the 
convergence  theory.  Therefore,  with  respect  to  convergence  and  general-purpose  properties, 
GPS  algorithms  show  great  promise  for  application  to  mixed-variable  stochastic  optimiza¬ 
tion  problems.  To  this  point  in  time,  these  methods  have  not  been  formally  analyzed  for 
problems  with  noisy  objective  functions. 

This  research  extends  the  class  of  GPS  algorithms  to  stochastic  optimization  problems. 
The  approach  calls  for  the  use  of  ranking  and  selection  (R&S)  statistical  methods  (see  [140] 
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for  a  survey  of  such  methods)  to  control  error  when  choosing  new  iterates  from  among 

candidate  designs  during  a  search  iteration.  Specific  objectives  of  the  research  are  as  follows. 

1.  Extend  the  GPS  algorithmic  framework  to  problems  with  noisy  response  functions  by 
employing  R&S  methods  for  the  selection  of  new  iterates.  Prove  that  algorithms  of 
the  extended  class  produce  a  subsequence  of  iterates  that  converges,  in  a  probabilistic 
sense,  to  a  point  that  satisfies  necessary  conditions  for  optimality. 

2.  Implement  specific  variants  within  the  GPS/R&S  algorithm  framework  that  offer 
comparatively  efficient  performance.  Implementation  focuses  on  two  general  areas: 

a.  The  use  of  modern  ranking  and  selection  methods  that  offer  efficient  sampling 
strategies. 

b.  The  use  of  surrogate  functions  to  approximate  the  objective  function  as  a  means 
to  accelerate  the  search. 

3.  Test  the  specific  methods  on  a  range  of  appropriate  test  problems  and  evaluate 
computational  efficiency  with  respect  to  competing  methods. 

1 . 3  Overview 

This  dissertation  document  is  organized  as  follows.  Ghapter  2  reviews  the  relevant 
literature  for  sampling-based  stochastic  optimization  and  generalized  pattern  search  meth¬ 
ods,  to  include  a  discussion  of  convergence  and  generality  properties.  Ghapter  3  presents  a 
new  mixed- variable  GPS/R&S  algorithmic  framework  and  attendant  convergence  theory. 
Ghapter  4  describes  the  specific  algorithm  options  that  were  implemented  for  the  testing 
phase  of  the  research.  Ghapter  5  presents  the  computational  evaluation  of  algorithm  imple¬ 
mentations  against  competing  methods  over  a  range  of  analytical  test  problems.  Ghapter 
6  offers  conclusions,  outlines  the  research  contributions,  and  suggests  directions  for  further 
research. 
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Chapter  2  -  Literature  Review 

Prior  to  investigating  new  methods  to  solve  mixed-variable  stochastic  optimization 
problems,  a  review  of  the  existing  literature  is  warranted.  Section  2.1  surveys  methods 
applied  to  stochastic  optimization  problems,  with  particular  emphasis  on  sampling-based 
methods  because  of  their  applicability  to  optimization  via  simulation.  The  survey  focuses 
on  methods  that,  in  some  way,  address  the  desired  properties  outlined  in  Section  1.2,  rela¬ 
tive  to  the  target  class  of  problems  (1.2).  This  review  demonstrates  that  a  gap  exists  in  the 
literature  for  treating  general,  mixed-variable  stochastic  optimization  problems,  which  sets 
the  stage  for  the  review  of  generalized  pattern  search  methods  in  Section  2.2.  Generalized 
pattern  search  (GPS)  methods,  selected  as  the  algorithmic  foundation  for  treating  the  tar¬ 
get  problems  (1.2)  in  this  research,  have  been  shown  to  be  convergent  for  mixed- variable 
deterministic  optimization  problems  in  a  series  of  recent  results.  A  chapter  summary  illus¬ 
trates  the  need  for  rigorous  methods  to  treat  mixed- variable  stochastic  optimization  prob¬ 
lems  and  explains  why  an  extension  to  generalized  pattern  search  is  a  valid  approach  for 
such  problems. 

2.1  Methods  for  Stochastic  Optimization 

Stochastic  optimization  may  be  defined  in  terms  of  randomness  involved  in  either 
or  both  of  (a)  the  evaluation  of  the  objective  or  constraint  functions,  or  (b)  the  search 
procedure  itself  [134,  p.  7].  Throughout  this  document,  stochastic  optimization  refers  to 
the  former.  A  further  distinction  is  made  regarding  the  methods  considered  in  this  section. 
In  particular,  methods  usually  grouped  under  the  heading  of  stochastic  programming  are  not 
considered.  Stochastic  programming  models  typically  assume  that  probability  distributions 
governing  the  data  are  known  (or  can  be  estimated)  [117,  p.  7].  This  fact  is  often  exploited 
in  constructing  effective  solution  strategies.  In  the  present  problem  setting,  the  probability 


distribution  of  the  response  function  is  assumed  to  be  unknown  (although  some  limited 
assumptions  may  be  made)  but  can  be  sampled. 

Most  sampling-based  methods  for  stochastic  optimization  can  be  grouped  into  one 
of  five  categories:  stochastic  approximation,  random  search,  ranking  and  selection,  direct 
search,  and  response  surface  methods.  Each  class  of  methods  is  described  in  the  following 
subsections.  A  more  in-depth  account  of  these  and  other  methods  is  contained  in  a  number 
of  review  articles  on  simulation  optimization  [9, 10, 17, 34,46,48, 63, 92, 103, 119, 137, 138]. 

2.1.1  Stochastic  Approximation 

Stochastic  approximation  (SA)  is  a  gradient-based  method  that  “concerns  recursive  es¬ 
timation  of  quantities  in  connection  with  noise  contaminated  observations”  [83].  In  essence, 
it  is  the  stochastic  version  of  the  steepest  descent  method  that  rigorously  accommodates 
noisy  response  functions.  These  methods  possess  a  rich  convergence  theory  and  certain 
variants  can  be  quite  efficient  [134,  Chap.  7],  but  apply  primarily  to  continuous  domains 
only,  and  therefore  lack  generality. 

Early  applications  of  SA  to  simulation-based  optimization  appeared  in  the  late  1970s 
{e.g.  see  [18])  and,  since  then,  has  been  the  most  popular  and  widely  used  method  for 
optimization  of  stochastic  simulation  models  [137].  The  SA  principle  first  appeared  in 
1951  in  an  algorithm  introduced  by  Robbins  and  Monro  [115]  for  finding  the  root  of  an 
unconstrained  one-dimensional  noisy  function.  In  general,  SA  applies  to  problems  with  only 
continuous  variables.  A  multivariate  version  of  the  Robbins- Monro  algorithm,  adapted  from 
[10,  p.  317],  is  shown  in  Figure  2.1.  In  the  algorithm,  the  sequence  of  step  sizes  (also 
known  as  the  gain  sequence)  must  satisfy  restrictions  that  are  critical  to  the  convergence 
theory. 
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Robbins-Monro  Stochastic  Approximation  Algorithm 

Initialization:  Choose  a  feasible  starting  point  Aq  €  0.  Set  step  size  oq  >  0  and  suitable  stopping 
criteria. 

Set  the  iteration  counter  A:  to  0. 

1.  Given  Xk,  generate  an  estimate  ^{Xk)  of  the  gradient  X f{Xk). 

2.  Compute, 

Xk+i=Xk-ak%Xk)  .  (2.1) 

3.  If  the  stopping  criteria  is  satisfied,  then  stop  and  return  Xk+i  as  the  estimate  of  the  optimal 
solution.  Otherwise,  update  ak+i  S  (0,  Ofe)  and  A:  =  A;  +  1  and  return  to  Step  1. 


Figure  2.1.  Robbins-Monro  Algorithm  for  Stochastic  Optimization  (adapted 
from  [10]) 


Kiefer  and  Wolfowitz  [68]  extended  the  SA  principle  to  finding  the  maximum  of  one¬ 
dimensional  noisy  functions  using  central  finite  differences  to  estimate  the  derivative.  Blum 
[28]  extended  the  Kiefer- Wolfowitz  algorithm  to  the  multi-dimensional  case.  The  use  of 
finite  differences  to  estimate  the  gradient  in  Step  1  of  the  algorithm  in  Figure  2.1  is  often 
c&Wed  finite  difference  stochastic  approximation  (FDSA).  Using  central  differences,  the  zth 
element  of  the  gradient  is  estimated  at  iteration  k  according  to. 


UXk) 


-|-  CfcOj)  FtyXk  Ckcf) 
2cfc 
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1, 


n. 


(2.2) 


where  e*  is  the  fth  coordinate  vector  and  F{XkFckei)  denotes  an  estimate  of  /  at  X^FckCi 
for  some  perturbation  setting  >  0,  perhaps  a  single  sample  or  the  mean  of  several  samples 
of  F{Xk  ±  CkCijUj).  Note  the  reliance  of  the  perturbation  parameter  on  k.  As  with  the 
gain  sequence,  the  convergence  theory  relies  on  restrictions  on  the  sequence  Ck- 

A  disadvantage  of  finite-differencing  is  that  it  can  be  expensive,  requiring  response 
function  samples  at  each  of  2n  design  points  (using  central  differences)  to  estimate  the 
gradient.  An  alternative,  and  more  efficient,  gradient  estimator  is  based  on  the  concept  of 
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randomly  selecting  coordinate  directions  for  use  in  computing  j{x).  As  a  generalization  of 
a  random  direction  method  proposed  in  [44],  Spall  [132]  derived  the  following  simultaneous 
perturbation  gradient  estimator  for  deterministic  response  functions, 


UXk) 


4” 

dki 


n. 


(2.3) 


where  dk  =  [dfci )  •  •  • )  dkn]  represents  a  vector  of  random  perturbations  and  >  0  has 
the  same  meaning  as  in  (2.2).  The  convergence  theory  of  this  approach  was  subsequently 
extended  to  noisy  response  functions  in  [133].  Through  careful  construction  of  the  pertur¬ 
bation  vector  dk,  the  simultaneous  perturbation  stoehastie  approximation  (SPSA)  method 
avoids  the  large  number  of  samples  required  in  FDSA  by  sampling  the  response  function  at 
only  two  design  points  perturbed  along  the  directions  dk  and  —dk  from  the  current  iterate, 
regardless  of  the  dimension  n.  The  perturbation  vector  dk  must  satisfy  certain  statistical 
properties  defined  in  [134,  p.  183].  Specifically,  the  {dki}  must  be  independent  for  all  k 
and  i,  identically  distributed  for  all  i  at  each  k,  symmetrically  distributed  about  zero,  and 
uniformly  bounded  in  magnitude  for  all  k  and  i.  The  most  commonly  used  distribution  for 
the  elements  of  dk  is  a  symmetric  Bernoulli  distribution;  i.e.  ±1  with  probability  0.5  [48]. 

The  efficiency  of  SA  algorithms  can  be  enhanced  further  by  the  availability  of  direct 
gradients;  this  led  to  a  flurry  of  research  in  more  advanced  gradient  estimation  techniques 
from  the  mid-1980s  through  the  present  day  [48].  Specific  gradient  estimation  techniques 
include  Perturbation  Analysis  (PA)  [57],  Likelihood  Ratios  (LR)  [52],  and  Frequency  Do¬ 
main  Experimentation  (FDE)  [124].  These  methods  often  allow  an  estimate  of  the  gradient 
with  only  a  single  run  of  the  simulation  model.  However,  they  require  either  knowledge  of 
the  underlying  structure  of  the  stochastic  system  (for  PA  and  LR)  or  additional  modifica¬ 
tions  to  a  model  of  the  system  (for  FDE)  [10].  Therefore,  when  coupled  with  SA,  they  are 
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not  considered  sampling-based  methods  since  the  model  cannot  be  treated  as  a  black-box 
function  evaluator. 

A  well-established  convergence  theory  for  sampling-based  SA  methods  dates  back  to 
the  early  work  of  Kiefer  and  Wolfowitz  [68].  In  general,  FDSA  and  SPSA  methods  generate 
a  sequence  of  iterates  that  converges  to  a  local  minimizer  of  /  with  probability  1  (almost 
surely)  when  the  following  conditions  (or  similar  conditions)  are  met  [47] : 

OO  OO 


Gain  sequences:  lim  =  0,  lim  Cfc  =  0,  Yh  =  oo,  and  X] 

k^oo  k^QO  k=l  k=l 

Objective  function  regularity  conditions:  e.g.,  continuously  differentiable 
and  convex  or  unimodal  in  a  specified  region  of  the  search  space. 

Mean-zero  noise:  E  [^(Xk)  —  X  f{Xk)]  =  0  for  all  k  or  in  the  limit  as  /c  — *•  oo. 

Finite  variance  noise:  variance  of  the  noise  in  ^{Xk)  is  uniformly  bounded. 


The  specific  mathematical  form  of  these  conditions  depends  on  algorithm  implemen¬ 
tation,  assumptions  about  the  problem,  and  the  method  of  proving  convergence.  For  a 
coverage  of  the  various  approaches  to  the  convergence  theory,  see  [75],  [83],  or  [134,  Chap. 
4,  6-7].  The  restrictions  on  au  ensure  that  the  sequence  {a^}  converges  to  zero  but  not 
so  fast  as  to  converge  to  a  sub-optimal  value  or  too  slow  to  avoid  any  convergence.  The 
harmonic  series,  =  a/k  for  some  scalar  a,  is  a  common  choice  [10,  p.  318]  for  the  gain 
sequence.  In  practice,  the  convergence  rate  is  highly  dependent  on  the  gain  sequence  as  al¬ 
gorithms  may  be  extremely  sensitive  to  the  scalar  parameter  a  such  that  a  few  steps  in  the 
wrong  direction  at  the  beginning  may  require  many  iterations  to  correct  [70].  The  mean- 
zero  noise  requirement  ensures  that  the  gradient  estimate  7j  is  an  unbiased  estimate  of  the 
true  gradient,  and  the  finite  variance  noise  requirement  typically  ensures  that  the  variance 
of  the  noise  in  the  gradient  estimate  cannot  grow  any  faster  than  a  quadratic  function  of  x 
[134,  p.  106]. 
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Stochastic  approximation  methods  have  been  modified  over  the  years  to  enhance  per¬ 
formance  using  step  size  selection  rules  to  accelerate  convergence.  One  alternative  employs 
a  line  search,  a  commonly  used  globalization  strategy  in  deterministic  nonlinear  program¬ 
ming  in  which  the  minimum  value  of  the  objective  function  is  sought  along  the  search 
direction.  This  has  been  analyzed  for  use  in  SA  by  Wardi  [149],  for  example,  using  Armijo 
step  sizes.  Another  alternative  uses  iterate  averaging,  which  incorporates  the  use  of  infor¬ 
mation  from  previous  iterations  and  allows  the  gain  sequence  {a^}  to  decrease  to  zero  at  a 
slower  rate  than  1/k.  The  analysis  of  Polyak  and  Juditsky  [111]  and  Kushner  and  Yang  [76] 
shows  how  the  slower  decay  rate  of  {a^}  can  actually  accelerate  SA  algorithm  convergence. 

Stochastic  approximation  methods  have  also  been  extended  to  handle  more  compli¬ 
cated  problems.  For  problems  with  constraints,  the  algorithms  may  be  modified  by  using  a 
penalty  or  a  projection  constraint-handling  approach.  The  penalty  approach  was  analyzed 
in  a  FDSA  context  by  Kushner  and  Clark  [75,  Sec.  5.1,  5.4]  and  in  a  SPSA  context  by 
Wang  and  Spall  [148] .  Using  this  approach,  the  objective  function  is  augmented  with  the 
addition  of  a  penalty  term, 

f{x)  +  rkP{x) 

where  the  scalar  >  0  increases  with  k  and  P{x)  is  a  term  that  takes  on  positive  values 
for  violated  constraints.  Penalty  terms  are  well-suited  for  problems  in  which  some  of  the 
constraint  functions  require  noisy  response  evaluations  from  the  model,  since  it  cannot 
be  determined  prior  to  simulation  if  a  design  is  feasible  with  respect  to  these  constraints. 
However,  as  in  the  deterministic  case,  penalty  methods  suffer  from  computational  difficulties 
due  to  ill-conditioning  for  values  of  that  are  too  large  [25,  p.  369].  Additionally,  these 
methods  can  produce  a  sequence  of  infeasible  designs  that  converge  to  the  optimal  (feasible) 
solution  only  in  the  limit,  particularly  for  values  of  r^  that  are  too  small.  If  the  sampling 
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budget  is  severely  restricted,  this  can  result  in  a  terminal  solution  with  significant  constraint 
violations  because  the  algorithm  was  not  allowed  enough  of  a  budget  to  approach  the  feasible 
region. 

Projection  approaches  generate  a  sequence  of  feasible  design  points  by  replacing  (2.1) 

with 

Xfc+i  =  Ue{Xk  -  afc7(Xfc))  (2.4) 

where  He  denotes  projection  onto  the  feasible  domain  0.  Such  methods  are  analyzed  in 
the  FDSA  context  by  Kushner  and  Clark  [75,  Sec.  5.3]  and  in  the  SPSA  context  by  Sadegh 
[118].  Projection  methods  are  useful  when  all  constraint  functions  are  defined  explicitly 
in  terms  of  the  design  variables  so  that  response  samples  are  not  wasted  in  the  process  of 
determining  feasibility.  However,  these  methods  can  typically  handle  only  simple  constraint 
sets  {e.g.,  bound  and  linear  constraints)  to  facilitate  mapping  a  constraint  violation  to  the 
nearest  point  in  0  [134,  p.  195]. 

Although  primarily  applicable  to  continuous  domains,  a  version  of  SPSA  has  been 
developed  for  application  to  discrete  domains  of  only  integer- valued  variables  [50,51].  The 
discrete  version  uses  fixed  gains  {i.e.,  constant  ak  and  Ck)  and  approximates  the  objective 
function  with  a  smooth  continuous  function.  The  fixed  step  sizes  force  the  iterates  to  lie 
on  the  discrete-valued  grid  during  the  entire  search. 

2.1.2  Random  Search 

Random  search  methods  sequentially  step  through  the  design  space  in  a  random  man¬ 
ner  in  search  of  better  solutions.  The  general  algorithm  selects  a  candidate  design  point 
probabilistically  from  the  neighborhood  of  the  incumbent  design  point  and  chooses  the  in¬ 
cumbent  or  candidate  as  the  next  iterate  based  on  a  specified  criteria.  An  attractive  feature 
of  random  search  methods  is  that  the  flexibility  of  the  neighborhood  construct  allows  for 
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the  treatment  of  mixed  variables,  so  they  are  very  general.  However,  convergent  versions 
of  random  search  exist  primarily  for  discrete-only  domains  {e.g.,  [11])- 

A  general  random  search  algorithm  is  shown  in  Figure  2.2.  In  the  algorithm,  F{Xk) 
denotes  an  estimate  of  /(X^),  perhaps  a  single  sample  or  the  mean  of  a  number  of  samples 
of  F{Xk,uj).  The  algorithm  relies  on  several  user-defined  features.  In  Step  I,  a  candidate 
is  drawn  from  a  user-defined  neighborhood  N{Xk)  of  the  current  iterate  Xj..  Step  I  also 
requires  the  selection  of  a  probability  distribution  that  determines  how  the  candidate  is 
chosen.  Appropriate  acceptance  criteria  must  be  defined  in  Step  2. 

An  advantage  of  random  search  is  that  the  neighborhood  N{Xk)  can  be  defined  either 
locally  or  globally  throughout  the  design  space.  In  fact,  random  search  is  a  popular  method 
for  global  optimization  {e.g.,  see  [155]).  In  either  case,  N{Xk)  must  be  constructed  to 
ensure  the  design  space  is  connected  [11]  {i.e.,  it  is  possible  to  move  from  any  point  in  0  to 
any  other  point  in  0  by  successively  moving  between  neighboring  points).  Neighborhood 
construction  depends  in  large  part  on  the  domain  0.  Random  search  is  flexible  in  that 
it  can  accommodate  domains  that  include  any  combination  of  continuous,  discrete,  and 

Random  Search  Algorithm 

Initialization:  Choose  a  feasible  starting  point  Xq  g  0  and  generate  an  estimate  F{Xq).  Set  a 
suitable  stopping  criteria. 

Set  the  iteration  counter  A:  to  0. 

1.  Generate  a  candidate  point  X{.  —  N{Xk)  G  0  according  to  some  probability  distribution  and 
generate  an  estimate  F{X'f^). 

2.  If  F{X{)  satisfies  acceptance  criteria,  then  set  =  X’f,.  Otherwise,  set  Xk+i  =  Xk. 

3.  If  the  stopping  criteria  is  satisfied,  then  stop  and  return  as  the  estimate  of  the  optimal 

solution.  Otherwise,  update  k  =  k  +  1  and  return  to  Step  1. 

Figure  2.2.  General  Random  Search  Algorithm  (adapted  from  [11]) 
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categorical  variables.  For  an  entirely  continuous  0,  a  local  neighborhood  may  be  defined 
as  an  open  ball  of  a  specified  radius  about  the  incumbent  {e.g.,  [19,87]).  Alternatively,  a 
global  definition  may  allow  a  neighbor  to  assume  any  value  for  each  design  variable  within 
a  specified  range  if  the  problem’s  only  constraints  are  variable  bounds  {e.g.,  [131]).  For  an 
entirely  discrete  0,  a  local  definition  for  N{Xk)  may  include  the  nearest  grid  points  (in  a 
Euclidean  sense)  from  the  incumbent  {e.g.,  [7]),  whereas  a  global  definition  may  allow  all 
admissible  combinations  of  discrete  settings  for  the  design  vector  as  neighbors  {e.g.,  [8,154]). 
If  0  has  both  continuous  and  discrete  components,  a  hybrid  neighborhood  structure  can 
be  used  (see  [67]  and  [120]).  Although  the  random  search  literature  does  not  appear  to 
explicitly  account  for  categorical  variables  in  a  mixed- variable  context,  the  flexibility  of 
neighborhood  structures  certainly  admits  such  a  construct. 

Once  a  neighborhood  structure  is  determined,  the  method  for  sampling  randomly  from 
the  neighborhood  must  be  defined.  The  simplest  approach  is  a  random  draw  uniformly 
distributed  so  that  each  point  in  the  neighborhood  has  equal  probability  of  selection  [134, 
p.  38] .  This  method  can  be  broadly  implemented  for  either  continuous  or  discrete  domains. 
As  an  alternative  example  of  a  local  method  in  a  continuous  domain,  Matyas  [87]  suggested 
perturbing  the  incumbent  design  randomly,  =  X^  +  dk,  where  dk  is  distributed  normally 
with  a  mean  zero  vector  and  covariance  matrix  equal  to  the  identity  matrix  That  is, 
each  element  of  the  design  vector  is  randomly  perturbed  from  its  incumbent  value  according 
to  a  normal  distribution  with  mean  zero  and  unit  variance.  Such  blind  search  methods  do 
not  use  information  learned  during  the  search  to  improve  neighbor  selection.  Additional 
methods  employ  adaptive  techniques  that  combine  random  sampling  with  knowledge  gained 
during  the  search  to  enhance  selection. 
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Matyas  [87]  suggested  a  modification  to  the  normally  distributed  perturbation  vector 
that  allows  the  mean  vector  and  correlation  matrix  of  the  perturbations  to  vary  by  consid¬ 
ering  results  of  preceding  iterations.  Solis  and  Wets  [131]  present  a  similar  method  in  which 
the  mean  of  the  perturbation  vector  is  a  bias  vector  6^,  updated  after  every  iteration,  that 
“slants  the  sampling  in  favor  of  the  directions  where  success  has  been  recorded”  [131,  p.  25]. 

The  acceptance  criteria  required  in  Step  2  of  Figure  2.2  are  the  most  critical  of  the 
user-defined  features  in  the  presence  of  noisy  responses.  For  the  deterministic  case,  these 
criteria  may  simply  require  improvement  in  the  objective  function,  f{X'^)  <  f{Xk),  where 
X'l^  G  N{Xk).  Alternatively,  moves  that  fail  to  yield  an  improvement  may  be  accepted  with  a 
specified  probability  that  decreases  with  iteration  count,  such  as  in  simulated  annealing  [49]. 
Additional  considerations  are  required  for  noisy  response  functions  to  build  in  robustness 
to  the  noise. 

Two  basic  strategies  discussed  in  [134,  pp.  50-51]  are  averaging  and  acceptance  thresh¬ 
olds.  Using  averaging,  the  mean  from  a  number  of  response  samples  from  the  incumbent 
and  the  candidate  design  points  are  used  in  place  of  true  function  values.  The  approach 
more  adequately  accounts  for  variation  by  using  an  aggregate  measure,  but  adds  compu¬ 
tational  expense.  Using  thresholding,  a  candidate  design  point  is  accepted  if  it  satisfies 
F(A^,w)  <  F{Xk,uj)  —  Tk,  where  is  an  acceptance  threshold.  Using  a  threshold  ap¬ 
proximately  equal  to  two  standard  deviations  of  the  estimated  response  noise  implies  that 
only  design  points  with  two-sigma  improvement  are  accepted.  However,  overly  conservative 
thresholds  can  lead  to  many  rejections  and  therefore  slow  convergence. 

For  continuous  domains  and  noisy  response  functions,  formal  convergence  proofs  for 
random  search  methods  are  rare  [134,  p.  50].  Yakowitz  and  Fisher  [153,  Sect.  4]  provide 
an  exception  by  establishing  a  convergent  method  via  repeated  sampling  at  design  points 


17 


to  minimize  the  effect  of  error.  For  discrete  domains  with  a  finite  number  of  points,  much 
recent  work  has  led  to  several  convergent  methods.  A  number  of  specific  methods  that 
include  simulated  annealing  methods  are  discussed  in  [11]. 

In  an  entirely  discrete  domain,  the  random  search  framework  enables  the  sequence  of 
designs  visited  to  be  modeled  as  a  discrete  time  Markov  chain,  each  iterate  representing  a 
state  visited  by  the  chain.  This  fundamental  property  is  key  to  proving  asymptotic  con¬ 
vergence  as  the  number  of  iterations  goes  to  infinity.  The  strength  of  the  result  generally 
depends  on  how  the  optimal  solution  is  estimated;  the  usual  choices  being  the  most  fre¬ 
quently  visited  solution  or  the  current  solution  under  consideration  [46]. 

Methods  that  estimate  the  solution  using  the  current  design  point  are  able  only  to 
show  that  the  sequence  of  iterates  converges  in  probability  to  an  optimal  solution;  i.e., 

lim  P{Xl  G  0=^}  =  1 
k^oo 

where  0*  C  0  is  the  set  of  global  optimal  solutions  and  G  0  is  the  estimate  of  the 
optimal  solution.  In  order  for  this  sequence  to  converge,  the  methods  require  statistical 
evidence  that  trial  moves  will  result  in  improvement,  where  the  strength  of  the  evidence 
grows  with  the  number  of  iterations  [11].  For  simulated  annealing  type  algorithms,  this 
is  accomplished  by  decreasing  the  temperature  parameter  to  zero  as  iterations  increase 
to  infinity.  For  more  traditional  random  search  methods,  this  is  accomplished  by  forcing 
candidate  solutions  to  pass  an  increasing  number  of  trials  as  iterations  accumulate.  The 
number  of  trials  per  iteration  increases  to  infinity  as  iterations  increase  to  infinity. 

Methods  that  use  the  most  frequently  visited  solution  as  the  estimated  optimal  solution 
do  not  require  the  progressively  conservative  moves  discussed  in  the  preceding  paragraph. 
In  these  cases,  the  sequence  of  iterates  generated  by  the  algorithm  do  not  converge  at 
all  (they  are  irreducible,  time- homogeneous,  and  positive  recurrent  Markov  chains)  [11]. 
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However,  the  sequence  defined  by  {X^},  where  is  the  solution  that  the  Markov  chain 
{Xk}  has  visited  most  often  after  £  iterations,  can  be  shown  to  converge  almost  surely  to 
an  optimal  solution  [8];  i.e., 

=  1}  =  1 

where  the  indicator  I  a  equals  one  when  the  event  A  occurs  and  zero  otherwise.  This  is  a 
stronger  result  than  convergence  in  probability. 

2.1.3  Ranking  and  Selection 

Ranking  and  selection  (R&S)  procedures  are  “statistical  methods  specifically  developed 
to  select  the  best  system,  or  a  subset  of  systems  that  includes  the  best  system,  from 
a  collection  of  competing  alternatives”  [53,  p.  273].  These  methods  are  analogous  to 
exhaustive  enumeration  of  combinatorial  optimization  problems  in  which  each  of  a  small 
number  (<  20)  of  alternatives  can  be  simulated.  Ranking  and  selection  procedures  are 
typically  grouped  into  a  larger  class  of  statistical  procedures  that  also  includes  multiple 
eomparison  procedures  [53].  The  coverage  of  R&S  procedures  in  this  literature  review 
results  from  the  fact  that  they  have  recently  been  incorporated  within  iterative  search 
routines  applied  to  stochastic  optimization  via  simulation,  which  is  also  how  they  are  used 
in  this  research. 

Two  general  R&S  approaches  are  indifferenee  zone  and  subset  seleetion  [46].  Indifference- 
zone  procedures  guarantee  selection  within  (5  of  the  true  best  solution  with  user-specified 
probability  1  —  a  where  (5  represents  a  measure  of  praetieal  differenee  known  as  the  indif¬ 
ference  zone.  The  parameter  5  is  called  the  indifferenee  zone  parameter.  These  approaches, 
using  a  single  stage  or  multiple  stages  of  sampling,  collect  response  samples  from  the  alter¬ 
natives,  check  a  certain  stopping  criteria,  then  either  continue  sampling  or  stop  and  select 
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the  alternative  with  the  smallest  response  estimate  in  the  final  stage  [139].  The  original 
procedure  by  Bechhofer  [26]  is  a  single-stage  procedure  in  which  the  number  of  samples 
required  of  each  solution  is  determined  a  priori  according  to  a  tabular  value  related  to  the 
experimenter’s  choice  of  5  and  a.  Bechhofer’s  method  assumed  a  known  and  equal  variance 
in  response  samples  across  all  alternatives.  Dudewicz  and  Dalai  [42]  and  Rinott  [114]  ex¬ 
tended  the  approach  to  problems  with  unknown  and  unequal  response  variances  by  using 
an  initial  stage  of  sampling  to  estimate  variances.  These  estimates  are  used  to  prescribe 
the  number  of  second-stage  samples  needed  to  ensure  the  probability  of  correct  selection. 
This  concept  can  be  extended  to  many  stages  in  which  the  early  stages  use  a  predetermined 
number  of  samples  in  order  to  estimate  the  number  of  samples  required  in  the  final  stage 
to  make  a  selection.  Subset  selection  is  very  similar  to  indifference-zone  selection,  with  the 
exception  that  a  selected  subset  of  at  most  m  systems  will  contain  at  least  one  system  with 
a  response  within  5  of  the  optimal  value. 

To  define  the  requirements  for  a  general  indifference-zone  R&S  procedure,  consider  a 
finite  set  {Xi,X2, . . . ,  Xnc}  of  nc  >  2  candidate  design  points.  For  each  i  =  1, 2, . . . ,  nc, 
let  fi  =  f{Xi)  =  E[F{Xi,u})\  denote  the  true  objective  function  value.  The  /*  values  can 
be  ordered  from  minimum  to  maximum  as, 

/[l]  <  /[2]  <  •••  <  f[ncV 

The  notation  X[j]  indicates  the  candidate  with  the  zth  best  (lowest)  true  objective  function 
value.  If  at  least  one  candidate  has  a  true  mean  within  5  of  the  true  best,  i.e.  /[jj  —  /[ij  <  5 
for  some  (5  >  0  and  i  >  2,  then  the  procedure  is  indifferent  in  choosing  Xpj  or  as 
the  best.  The  probability  of  correct  selection  (CS)  is  defined  in  terms  of  the  5  and  the 
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significance  level  a  G  (0, 1),  as 


P{CS}  =  P  {select  X[i]  |  -  /[i]  >  6,i  =  2, . . .  ,nc}  >  1  -  a,  (2.5) 

where  6  and  a  are  user  specified.  Since  P{CS}  =  ^  is  guaranteed  simply  by  choosing 
randomly  from  the  alternatives,  the  significance  level  must  satisfy  0  <  a  <  1  — 

Traditional  multi-stage  indifference-zone  procedures  can  be  too  computationally  cum¬ 
bersome  to  accommodate  a  large  set  of  candidates  because  they  are  based  on  the  least 
favorable  eonfiguration  assumption  that  the  best  candidate  has  a  true  mean  exactly  S  bet¬ 
ter  than  all  remaining  candidates,  which  are  tied  for  second  best  [140].  As  a  result,  the 
procedures  can  overprescribe  the  number  of  samples  required  in  the  final  stage  in  order  to 
guarantee  that  (2.5)  holds.  Two  recent  directions  in  R&S  research  reflect  attempts  to  ad¬ 
dress  this  issue.  The  first  has  been  to  combine  a  search  strategy  with  R&S  to  enable  a  global 
search  of  a  possibly  large  solution  space.  As  examples,  Olafsson  [102]  and  Pichitlamken  and 
Nelson  [110]  each  introduce  an  iterative  technique  that  combines  R&S  with  a  global  opti¬ 
mization  strategy  known  as  nested  partitioning  (NP),  which  is  used  to  adaptively  search 
the  feasible  space  of  (possibly  large)  combinatorial  problems.  In  each  approach,  a  discrete 
time  Markov  chain  analysis  is  used  to  show  almost  sure  convergence  to  a  global  optimum 
of  the  discrete  and  finite  variable  space.  Ahmed  and  Alkhamis  [4]  describe  and  analyze  a 
globally  convergent  algorithm  that  embeds  R&S  procedures  within  simulated  annealing  for 
optimization  over  a  discrete  domain.  Boesel  et  al.  [29]  and  Hedlund  and  Mollaghasemi  [56] 
combine  R&S  procedures  with  genetic  algorithms. 

The  second  trend  has  been  to  invent  more  modern  procedures  that,  through  enhanced 
efficiency  in  terms  of  sampling  requirements,  can  accommodate  a  larger  number  of  solutions. 
One  such  procedure  combines  subset  selection  with  indifference-zone  selection  as  a  means 
to  screen  out  noncompetitive  solutions  and  then  select  the  best  from  the  survivors.  A 
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general  theory  is  presented  by  Nelson  et  al.  [99]  that  balances  computational  and  statistical 
efficiency.  This  approach  maintains  a  probability  guarantee  for  selecting  the  best  solution 
when  using  the  combined  procedure.  Another  procedure,  by  Kim  and  Nelson  [69],  is  a 
so-called  fully  sequential  procedure,  which  is  one  that  takes  one  sample  at  a  time  from 
every  alternative  still  in  play  and  eliminates  clearly  inferior  ones  as  soon  as  their  inferiority 
is  apparent.  After  an  initial  stage  of  sampling,  a  sequence  of  screening  steps  eliminates 
alternatives  whose  cumulative  sums  exceed  the  best  of  the  rest  plus  a  tolerance  level. 
Between  each  successive  screening  step,  one  additional  sample  is  taken  from  each  survivor 
and  the  tolerance  level  decreases. 

Categorical  variables  are  readily  handled  by  modern  R&S  techniques  since  all  design 
alternatives  are  determined  a  priori  and  corresponding  variable  values  can  be  set  accord¬ 
ingly.  However,  the  limited  capacity  of  R&S  restricts  the  number  of  solutions  that  can  be 
considered  to  a  discrete  grid  of  points  in  the  solution  space,  so  that  thorough  exploration 
of  this  space  is  not  possible.  The  existing  provably  convergent  techniques  [4, 102, 110]  that 
combine  R&S  with  adaptive  search  currently  address  entirely  discrete  domains.  Continu¬ 
ous  variables  can  be  dealt  with  via  a  discretization  of  the  variable  space,  but  this  can  lead 
to  a  combinatorial  explosion  of  the  search  space  and  an  increase  in  computational  expense. 

2.1.4  Direct  Search 

Direct  search  methods  involve  the  direct  comparison  of  objective  function  values  and 
do  not  require  the  use  of  explicit  or  approximate  derivatives.  For  this  reason,  they  are  easily 
adapted  to  black-box  simulation,  demonstrating  some  inherent  generality  properties.  This 
feature  has  led  to  their  use  as  sampling-based  methods  for  stochastic  optimization,  which 
is  documented  in  this  section.  However,  direct  search  methods  for  stochastic  optimization 
have  only  considered  unconstrained  problems  with  continuous  variables.  The  GPS  class  of 
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algorithms,  which  are  a  subset  of  direct  search  methods,  are  more  general  than  the  classical 
methods  covered  in  this  section,  but  have  yet  to  applied  to  stochastic  problems.  Since  GPS 
methods  are  the  cornerstone  of  this  research,  they  are  covered  in  more  detail  in  Section  2.2. 

Interestingly,  direct  search  methods  evolved  from  efforts  to  optimize  systems  involving 
random  error.  In  conjunction  with  his  early  work  in  the  field  of  RSM,  Box  proposed  a 
method  for  improving  industrial  efficiency  known  as  evolutionary  operation  (EVOP)  [33] 
in  the  mid-1950s.  Intended  as  a  simple  tool  that  could  be  used  by  plant  personnel,  the 
estimation  of  regression  models  was  replaced  by  direct  inspection  of  data  according  to  the 
“patterns”  of  the  experimental  design  [144].  Spendley,  Hext,  and  Himsworth  [136]  suggested 
an  automated  procedure  of  EVOP  for  use  in  numerical  optimization  and  replaced  factorial 
designs  with  simplex  designs.  Several  more  direct  search  methods  were  proposed  in  the 
1960s  [152]  and  include  the  well-known  direct  search  method  of  Hooke  and  Jeeves  [60] 
and  the  simplex  method  of  Nelder  and  Mead  [98],  which  is  an  extension  of  the  method 
of  Spendley  et  al.  At  the  time,  these  methods  were  considered  heuristics  with  no  formal 
convergence  theory  [I,  p.  22].  Research  on  direct  search  methods  faded  during  the  1970s 
and  1980s  but  were  revived  in  the  1990s  with  the  introduction  and  convergence  analysis  of 
the  class  of  pattern  search  methods  for  unconstrained  optimization  problems  by  Torczon 
[143]. 

In  general,  traditional  direct  search  methods  are  applicable  to  continuous  domains 
and  are  easily  adapted  to  stochastic  optimization  because  they  rely  exclusively  on  response 
function  samples.  Perhaps  the  most  frequently  used  direct  search  methods  for  stochastic 
optimization  are  the  Nelder-Mead  simplex  search  and  Hooke-Jeeves  pattern  search.  The 
Nelder-Mead  method  conducts  a  search  by  continuously  dropping  the  worst  point  from  a 
simplex  of  n  -|-  I  points  and  adding  a  new  point.  A  simplex  is  a  convex  hull  of  a  set  of 
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n  +  1  points  not  all  lying  in  the  same  hyperplane  in  M”  [24,  p.  97].  During  a  search 
iteration,  the  geometry  of  the  simplex  is  modified  by  expansion,  reflection,  contraction,  or 
shrinking  operations  that  are  triggered  based  on  the  relative  rank  of  the  point  added  at  that 
iteration.  Figure  2.3,  based  on  [141,  pp.  5-7]  depicts  the  Nelder-Mead  search  procedure. 
In  the  algorithm,  F{Xk)  denotes  an  estimate  of  /(X^),  perhaps  a  single  sample  or  the 
mean  of  a  number  of  samples  of  F(Xfc,w).  The  algorithm  relies  on  several  parameters  that 
determine  how  the  geometry  is  refined  during  the  search.  Common  parameter  choices  are 
77  =  1,  7  =  2,  /?  =  i,  and  n  = 


Nelder-Mead  Search 

Initialization:  Choose  a  simplex  of  feasible  points  5*0  =  {Xi,  X2, . . . ,  X„+i}  e  0  and  generate 
estimates  F(Xi),  F{X2),  ...,  F(X„+i).  Reorder  the  set  5*0  so  that  F{Xi)  <  F{X2)  <  •••  < 
F(X„+i).  Set  reflection  parameter  7,  expansion  parameter  7,  contraction  parameter  /3,  and  shrink 
parameter  k.  Set  a  suitable  stopping  criteria. 

Set  the  iteration  counter  A:  to  0. 

1.  If  necessary,  reorder  Sk  so  that  F{X\)  <  F{X2)  <  •  •  •  <  F(X„+i).  Find  the  centroid  of  the  n 

1  V 

best  points  in  Sk,  Xcen  =  >  X^.  Generate  reflected  point  X^ef  =  (1  +  7)Xcen  —  77X„+i  and 

estimate  F(Xref). 

2.  If  F(Xret)  >  .F(Xi),  then  go  to  Step  3.  Otherwise,  generate  expansion  point  Xexp  =  7Xref  +  (1  — 
7)Xcen  and  estimate  F(Xexp).  If  F(Xexp)  <  F’(Xi),  then  set  Xi  =  Xexp;  otherwise,  set  Xi  =  X^ef. 
Go  to  Step  6. 

3.  If  F’(Xeet)  >  F{Xn),  then  go  to  Step  4.  Otherwise,  set  Xn+i  =  X^et  and  go  to  Step  6. 

4.  If  F(Xjef)  <  F’(X„+i),  then  set  X„+i  =  X^et-  Otherwise,  retain  the  current  X„+i.  Go  to  Step  5. 

5.  Generate  contraction  point  Xeon  =  /3X„+i  -I-  (1  —  /3)Xeen  and  estimate  F’(Xeon).  If  F’(Xeon)  < 
F’(X„+i),  then  set  X„+i  =  Xeon;  otherwise,  shrink  the  entire  simplex  toward  Xi  by  setting  Xi  = 
nXi  -b  (1  —  fc)Xi  for  7  =  2, . . . ,  n  4- 1.  Go  to  Step  6. 

6.  If  the  stopping  criteria  is  satisfied,  then  stop  and  return  Xi  as  the  estimate  of  the  optimal  solution. 
Otherwise,  set  Sk+i  =  Sk  and  reorder  (if  necessary)  so  that  F{Xi)  <  F{X2)  <  •  •  •  <  F'(X„+i). 
Update  A:  =  A:  +  1  and  return  to  Step  1. 

Figure  2.3.  Nelder-Mead  Search  (adapted  from  [141]) 
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Although  the  Nelder-Mead  algorithm  possesses  no  general  convergence  theory  [152],  it 
generates  a  search  path  with  some  inherent  robustness  for  problems  with  noisy  responses, 
due  to  its  reliance  on  the  relative  ranks  of  the  vertices  in  the  simplex  [142]  (as  opposed  to 
precise  estimates).  However,  this  very  feature  may  also  cause  the  algorithm  to  terminate 
prematurely.  Premature  termination  results  when  large  random  disturbances  in  the  func¬ 
tional  response  change  the  relative  ranks  of  the  function  values  in  the  simplex  and  inappro¬ 
priately  affect  the  scaling  steps  [23] .  An  early  attempt  to  mitigate  inappropriate  scaling  was 
carried  out  by  Barton  [20],  who  compared  a  variant  of  the  method  to  three  other  methods 
(including  an  unmodified  Hooke-Jeeves  search)  to  minimize  functions  with  random  noise 
using  Monte  Carlo  simulation.  In  Barton’s  approach,  points  are  sampled  once;  however,  a 
reflected  point  is  resampled  if  it  is  found  worse  than  the  two  poorest  values  from  the  previ¬ 
ous  simplex.  After  resampling,  if  a  different  point  is  the  worst,  then  the  new  worst  point  is 
used  in  a  new  reflection  operation.  Barton  and  Ivey  [22,23]  introduced  three  Nelder-Mead 
variants  with  the  goal  of  avoiding  false  convergence.  The  first  variant  (called  S9)  simply 
increases  the  shrink  parameter  k  from  0.5  to  0.9.  The  second  variant  (RS)  resamples  the 
best  point  after  a  shrink  step  before  determining  the  next  reflection.  The  third  variant 
(PC)  resamples  Xj-ef  and  if  a  contraction  is  indicated  in  Step  3  in  Figure  2.3  and  these 
two  points  are  compared  to  each  other  again  without  reordering  the  ranks  of  the  remain¬ 
ing  points  (if  they  change).  If  F{X^ef)  >  F{Xn)  still  holds,  then  contraction  is  performed 
as  normal;  otherwise,  Xj-gf  is  accepted  as  the  new  point  in  the  simplex  and  contraction  is 
bypassed.  Based  on  empirical  results.  Barton  and  Ivey  concluded  that  a  combination  of 
variants  S9  and  RS  provide  statistically  significant  improvements  over  the  unmodified  pro¬ 
cedure,  reducing  the  deviation  between  the  terminal  solution  and  known  optimal  solution 
relative  to  the  standard  method  by  an  average  of  15%  over  18  test  problems. 
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Tomick  et  al.  [142]  suggested  further  modifications  to  Nelder-Mead  for  noisy  responses. 
In  their  approach,  each  point  in  the  simplex  is  averaged  over  samples  per  iteration  where 
TUfc  is  adjusted  from  the  previous  iteration  based  on  a  statistical  test  on  the  hypothesis  of 
equal  response  means  across  all  points  of  the  simplex.  If  the  test  is  accepted  (rejected), 
sample  size  is  increased  (decreased)  by  a  constant  factor.  This  method,  which  includes 
the  shrink  parameter  increase  (S9)  of  Barton  and  Ivey  [22,23],  reduced  the  deviation  of 
the  terminal  solution  from  the  known  optimal  solution  to  less  than  20%  of  the  starting 
value  for  each  of  the  18  test  problems.  Finally,  Humphrey  and  Wilson  [61,62]  present  a 
Nelder-Mead  variant  with  three  phases  in  which  (a)  the  terminal  point  from  one  phase 
becomes  the  starting  point  for  the  next  phase,  (b)  the  distance  between  initial  simplex 
points  decreases  geometrically  and  the  shrink  parameter  increases  linearly  with  each  phase, 
and  (c)  the  solution  is  taken  as  the  best  of  the  terminal  points  from  the  three  phases.  Each 
phase  represents  a  restart  of  the  basic  Nelder-Mead  procedure  where  the  increase  in  the 
shrink  parameter  serves  to  protect  against  premature  termination.  In  comparison  to  the 
algorithm  of  Barton  and  Ivey  [22, 23]  (that  included  the  RS  and  S9  modifications)  for  six 
test  problems  with  known  solutions,  this  procedure  found  a  more  accurate  solution  for  five 
of  them  while  expending  approximately  equal  computational  effort. 

The  Hooke-Jeeves  method  conducts  a  search  via  a  series  of  exploratory  and  pattern 
moves  through  the  solution  space.  During  a  search  iteration,  exploratory  moves  are  con¬ 
ducted  locally  along  the  coordinate  axes,  and  pattern  moves  are  conducted  along  the  direc¬ 
tion  defined  by  the  starting  and  ending  points  of  exploratory  moves.  In  a  simulation-based 
application,  Nozari  and  Morris  [101]  applied  the  Hooke-Jeeves  pattern  search  in  conjunc¬ 
tion  with  the  two-stage  R&S  procedure  of  Dudewicz  and  Dalai  [42] .  In  the  approach,  the 
R&S  procedure  is  used  in  the  exploratory  search  step  in  order  to  find  which  candidate  along 
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any  of  the  coordinate  axes  produces  the  best  solution.  The  direction  from  the  incumbent 
to  the  chosen  candidate  solution  is  then  used  as  the  pattern  search  direction.  Nandkeol- 
yar  and  Christy  [96]  implemented  a  Hooke- Jeeves  algorithm  with  a  modified  step  size  up¬ 
date  rule  in  which  only  statistically  significant  improvements  in  the  response  function  are 
recognized.  Pegden  and  Gately  implemented  an  optimization  module  using  Hooke-Jeeves 
pattern  search  into  the  GASP  [106]  and  SLAM  [107]  simulation  languages.  In  both  imple¬ 
mentations,  a  standard  statistical  test  is  used  in  the  comparison  of  response  means  before 
selecting  new  iterates.  The  method  starts,  stops,  and  continues  one  long  simulation  run  for 
each  design  point,  comparing  the  means  of  numerous  batches,  until  the  difference  in  means 
is  statistically  significant.  In  addition  to  Barton  [20],  Lacksonen  [77]  evaluated  the  Hooke- 
Jeeves  method  in  a  comparison  with  other  methods.  Lacksonen  increased  the  number  of 
samples  for  candidate  points  as  the  step  length  parameter  decreased  in  order  to  improve 
precision  of  the  estimate  but  no  formal  statistical  test  was  used.  Sample  sizes  of  one,  four, 
and  seven  were  used  for  the  prespecified  step  length  values,  terminating  the  algorithm  when 
exploratory  search  failed  to  find  an  improving  solution. 

Direct  search  methods  do  not  possess  a  general  convergence  theory  in  the  stochastic 
setting,  with  one  notable  exception.  In  [6],  Anderson  and  Ferris  introduce  a  search  algo¬ 
rithm  that  operates  on  a  set  of  points  (called  a  structure)  in  a  continuous  variable  domain 
with  noisy  function  evaluations.  The  algorithm  converges  almost  surely  to  a  stationary 
point  of  a  uniformly  Lipschitz,  continuously  differentiable  objective  function.  The  opera¬ 
tions  on  the  structure  are  similar  to  those  of  the  Nelder-Mead  algorithm  but  differ  in  that, 
for  each  reflection,  expansion,  and  contraction  operation,  all  points  except  the  best  point 
of  the  structure  are  repositioned,  whereas,  in  Nelder-Mead,  only  a  single  point  is  reflected 
or  expanded.  A  key  assumption  in  algorithm  convergence  is  that  the  random  error  in  the 
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responses  tends  to  zero  faster  than  the  step  length  (representing  the  size  of  the  structure) . 
In  practice,  this  is  accomplished  via  increased  samples.  Interestingly,  the  authors  note  that 
the  convergence  proof  is  dependent  on  the  characterization  of  random  error  and,  in  fact, 
fails  in  the  absence  of  error.  In  these  cases,  they  claim  that  the  method  is  a  generalized 
pattern  search  method^  and  convergence  is  guaranteed  by  the  analysis  of  [143] . 

2.1.5  Response  Surfaee  Methods 

Response  surface  methods  (RSM)  are  broadly  defined  as  “statistical  and  mathematical 
techniques  useful  for  developing,  improving,  and  optimizing  processes”  [94,  p.  Ij.  When 
used  for  optimization,  these  methods  fit  a  smooth  surface  to  response  values  obtained  from 
a  sampling  of  design  points.  This  surface,  called  a  response  surfaee,  metamodel,  or  surrogate 
funetion,  may  be  searched  inexpensively  using  traditional  deterministic  methods  in  order  to 
explore  the  search  space.  Since  constructing  response  surfaces  depends  on  the  availability 
of  response  samples,  RSM  is  easily  applied  to  black-box  simulation.  These  methods  can 
be  applied  directly  to  solve  stochastic  optimization  problems  or  can  be  used  to  augment 
more  rigorous  procedures  as  a  means  to  improve  the  search.  For  example,  Booker  et  al. 
[32]  describe  the  use  of  response  surfaces  within  a  pattern  search  framework  as  a  means 
to  accelerate  the  search.  Since  response  surfaces  will  be  used  in  the  implementation  of  the 
algorithms  developed  in  this  research  (Section  4.2),  a  broad  coverage  of  RSM  for  stochastic 
optimization  is  presented  in  this  section. 

Due  to  the  breadth  of  its  application,  the  research  literature  on  RSM  is  vast.  In 
application  to  stochastic  optimization  via  simulation,  its  history  dates  back  to  the  early 
1970s  {e.g.,  [123,130]).  The  basic  RSM  approach  calls  for  solving  a  sequence  of  optimization 
problems  in  which  the  true  objective  function  is  approximated  by  a  response  surface.  The 

'^Note:  This  is  true  if  the  incumbent  solution  is  defined  as  the  centroid  of  the  structure  and  candidate 
solutions  are  the  surrounding  points  of  the  structure. 
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process  begins  by  establishing  a  prespecified  number  of  design  points  in  a  region  of  the 
search  space  according  to  an  experimental  design.  Sampled  responses  are  obtained  for  each 
point  and  a  local  response  surface  is  built  in  the  region.  Information  from  the  response 
surface  is  used  to  guide  the  search  to  a  new  region  and  the  process  is  repeated  until  reaching 
a  stopping  criterion.  Alternatively,  the  response  surface  can  globally  approximate  the  true 
objective  function,  and  intermediate  design  points  sampled  during  the  search  can  be  used 
to  enhance  the  accuracy  of  the  global  approximation.  The  primary  issues  involved  in  the 
process  include  [21]: 

•  the  choice  of  a  functional  form  of  the  response  surface, 

•  the  choice  of  an  experimental  design  to  select  points  from  the  design  space,  and 

•  the  method  for  assessing  of  the  adequacy  of  the  htted  model  {e.g.,  lack  of  ht  or 
mean  squared  error). 

To  ht  a  response  surface,  response  samples  must  be  collected  from  some  set  of  design 
sites  (points  in  the  design  space)  Xi, . . .  ,Xn.  Let  Fj  denote  the  response  at  site  Xi  where 
Fi  may  represent  a  single  response  or  the  mean  of  a  set  of  responses.  The  input/output 
relationship  for  the  {{Xi,  data  points  is  often  modeled  as  a  deviation  from  the  true 

objective  function, 

Fi  =  f{Xi)  +  Si,  i  = 

with  observation  errors  £j.  Methods  for  htting  the  data  points  to  a  response  can  be  divided 
into  two  general  classes,  parametric  and  nonparametric  methods  [55,  p.  4].  Parametric 
methods  assume  the  underlying  function  /  has  a  prespecihed  functional  form  {e.g.,  a  poly¬ 
nomial)  fully  described  by  a  set  of  parameters.  Nonparametric  methods  make  minimal 
assumptions  regarding  the  structure  of  /.  Examples  of  each  class  of  methods  will  be  briehy 
described  in  the  following  paragraphs. 
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Traditional  response  surface  methods  {e.g.,  [94])  typically  use  parameterized  polyno¬ 
mials  where  regression  is  used  to  fit  the  response  surface.  These  methods  typically  fit  a 
function  /  using  a  linear  model  (2.6)  or  a  quadratic  model  (2.7), 

n 

/(x)  =  /3o  + (2.6) 

2=1 

n  n  n 

fix)  =I3o  +  Y1  (2.7) 

2=1  2=1  j=i 

where  the  parameters  /3q,  13^,  and  i  =  1, . . .  ,n,  j  =  i, . .  .n,  are  determined  through 
least  squares  regression  which  minimizes  the  sum  of  the  squared  deviations  of  the  predicted 
values  from  the  actual  values  [94].  After  the  response  surface  is  built,  then  the  method  of 
steepest  descent  is  typically  used,  where  the  search  direction  is  chosen  as  the  negative  of 
the  gradient.  Under  the  linear  model,  the  gradient  is  simply  V/(x)  =  [(3i,  (32,  ■  ■  ■ , 

under  the  quadratic  model,  V/(x)  =  [|^,  |^, . . . ,  ■,  where  =  /?*  +  2/3jjXj  -t-  ^  PijXj. 

2=1 

In  application  to  simulation-based  optimization,  much  of  the  research  in  polynomial 
based  RSM  prior  to  1990  is  summarized  in  Jacobson  and  Schruben  [63],  in  which  several 
improvements  are  discussed  such  as  screening  for  variable  reduction,  allowance  for  multiple 
objectives,  constraint-handling  via  the  methods  of  feasible  directions  and  gradient  projec¬ 
tion,  variance  reduction  via  common  and  antithetic  pseudorandom  numbers,  and  the  effects 
of  alternative  experimental  designs.  More  recently,  Joshi,  et  al.  [66]  introduced  gradient 
deflection  and  second-order  search  strategies  to  the  RSM  approach.  This  method  retains 
information  from  previous  iterations  and  builds  knowledge  of  second-order  curvature  of  the 
objective  function,  thereby  avoiding  the  zigzagging  experienced  by  steepest  descent.  An¬ 
other  approach  by  Anglin  et  al.  [12]  generalizes  the  method  of  steepest  descent  search 
direction  to  multiple  responses  using  an  interior  point  approach  with  an  affine  scaling  algo- 
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rithm  and  projection.  The  method  derives  a  scale  independent  search  direction  and  several 
step  sizes  that  enables  the  algorithm  to  reach  a  neighborhood  of  the  optimum  in  a  few 
simulation  runs.  Finally,  Abspoel  et  al.  [3]  present  an  approach  that  uses  sequential  linear 
programming  with  move  limits  [25,  p.432]  in  concert  with  polynomial  regression  for  prob¬ 
lems  with  random  objective  functions  and  constraints  over  an  integer  variable  domain. 

Another  parametric  model  fitting  approach  is  known  as  kriging.  The  kriging  approach 
builds  a  response  surface  via  the  combination  of  a  fixed  function  g{x)  and  departures  from 
the  fixed  function  in  the  following  form  [91]: 

fix)  =  gix)+Z{x), 


where  Z  (x)  is  a  realization  of  a  stochastic  process  with  mean  zero  and  a  spatial  correlation 
function.  The  underlying  model  g{x)  globally  approximates  the  true  function  and  is  typi¬ 
cally  taken  to  be  a  constant  but  can  be  a  general  function  with  its  own  parameters.  The 
Gaussian  spatial  correlation  function  is  given  by 


Cov  [Z{Xi),  Z{Xj)]  =  a^R{Xi,Xj), 


where  Xi  and  Xj  are  two  of  N  design  sites,  cr^  is  the  process  variance,  and  R  is  the  N  x  N 
correlation  matrix.  A  commonly  used  correlation  matrix  has  ones  along  the  diagonals  and 
the  following  off-diagonal  elements  [72] : 


R{Xi,Xj)  =  exp 


E 

k=l 


Ok 


Xf  -  Aj 


where  0^  are  the  parameters  used  to  fit  the  model  and  Xf  is  the  fcth  components  of  sample 
point  Xi.  With  this  function,  each  predicted  point  is  essentially  a  linear  combination  of 
exponentially  decaying  functions  that  are  based  on  the  spatial  distance  between  X^  and 


Xf. 
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Kriging  models  have  gained  popularity  in  optimization  methods  for  expensive  deter¬ 
ministic  simulation  {e.g.,  [32, 128, 129, 147]).  Recently  they  have  been  applied  toward  prob¬ 
lems  involving  randomness  [65,73]. 

The  use  of  artificial  neural  networks  (ANNs)  may  also  be  considered  response  surface 
approximation  methods.  ANNs  are  modelled  after  neurons  of  the  human  brain  and  consist 
of  an  input  layer,  an  output  layer,  and  a  series  of  hidden  inner  layers  [54].  They  can  use 
noisy  response  samples  to  approximate  arbitrary  smooth  functions  [21];  therefore,  they  may 
be  considered  nonparametric  fitting  methods.  A  comprehensive  introduction  to  ANNs  can 
be  found  in  [150] . 

The  presence  of  inner  layers  allow  the  ANN  to  learn  nonlinear  relationships  between 
input  and  output  quantities.  In  an  optimization  problem,  the  input  quantities  represent 
sampled  design  point  values  and  the  output  quantities  represent  responses.  Each  neuron 
in  an  ANN  has  an  activation  function  (sigmoid  function  or  step  function)  with  associated 
weight  parameters  that  are  analogous  to  regression  parameters  in  polynomial  regression. 
The  ANN  is  trained  on  the  sampled  design  points  by  finding  values  for  the  weights  that 
minimize  an  error  function  that  quantifies  the  difference  between  actual  response  values 
and  values  predicted  by  the  ANN.  In  this  manner,  the  ANN  is  a  predictive  tool  that 
produces  new  output  (response)  values  for  new  input  values.  This  method  has  been  used,  for 
example,  by  Laguna  and  Marti  [78] ,  in  application  to  optimization  via  stochastic  simulation 
of  a  jobshop.  In  their  approach,  the  ANN  is  trained  during  the  course  of  the  search  and 
then  used  to  filter  out  candidate  designs  that  are  predicted  to  be  inferior  before  expending 
response  samples  for  those  designs. 

Another  nonparametric  fitting  method  is  known  as  kernel  regression  or  kernel  smooth¬ 
ing.  Hardle  [55]  provides  a  detailed  coverage  of  these  methods,  with  particular  attention 
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paid  to  the  case  of  noisy  responses  (see  [55,  Sect  2.1]  for  a  discussion).  The  cornerstone  of 

kernel  regression  is  the  Nadaray a- Watson  estimator  [95,151],  which  is  used  to  approximate 

the  objective  function  at  a  point  x  according  to, 

N 

Y,FiKh{x-Xi) 

fix)  =  ^ -  (2.8) 

^Kh{x-Xi) 

i=l 

where  is  an  appropriately  selected  kernel  funetion  that  depends  on  parameter  h  and  the 
distance  from  x  to  each  design  site.  As  it  has  its  origins  in  probability  density  estimation, 
the  kernel  function  must  integrate  to  unity;  i.e.,  K^ix)  =  1.  The  estimate  /  can  be 
thought  of  as  the  weighted  average  of  all  response  samples,  Fj,  where  the  weight  received 
by  Fi  depends  on  Kh,  the  distance  (x  —  Aj),  and  the  smoothing  parameter  h.  The  kernel 
function  determines  the  “shape”  of  the  weights  and  h  determines  the  “size”  of  the 
weights.  For  numerical  reasons,  kernel  functions  typically  take  on  mound-shaped  forms 
that  are  zero  outside  some  fixed  interval  [55,  p.  25],  such  as  the  parabolic  Epaneehnikov 
kernel  [43]  or  a  Gaussian.  Kernel  regression  has  been  used  iteratively  within  a  stochastic 
approximation  framework  {e.g.,  [97])  for  the  purpose  of  recursively  estimating  the  root  of 
a  noisy  function. 

Additional  fitting  methods  have  been  proposed  to  approximate  functions  that  will 
simply  be  mentioned  here.  One  such  method  is  known  as  multivariate  adaptive  regression 
splines  (MARS)  [64] .  This  method  adaptively  selects  a  set  of  basis  functions  for  approx¬ 
imating  the  response  function  through  a  forward/backward  iterative  approach.  Another 
method  involves  the  use  of  radial  basis  funetions  [64].  This  method  uses  linear  combina¬ 
tions  of  a  radially  symmetric  function  based  on  the  Euclidean  distance  or  a  similar  metric 
to  approximate  the  response  functions. 


33 


There  is  no  general  convergence  theory  for  RSM  methods.  Indeed,  this  would  be  dif¬ 
ficult  to  establish  in  its  pure  form  since  optimization  is  performed  on  an  approximation  of 
the  true  objective  function.  Even  in  the  absence  of  random  noise,  the  approximate  model 
contains  inaccuracies  which  are  exacerbated  if  the  true  function  is  highly  nonlinear.  Fur¬ 
thermore,  due  to  their  interpolatory  nature,  the  methods  are  usually  restricted  to  entirely 
continuous  domains.  However,  extensions  to  integer  variables  are  possible  by  relaxing  in¬ 
tegrality  constraints  on  the  approximate  model  and  ensuring  that  solutions  encountered 
during  the  search  are  mapped  to  admissible  discrete  points  in  the  search  space  {e.g.,  [3]). 
This  approach  is  unsuitable  with  respect  to  categorical  variables,  necessitating  the  con¬ 
struction  of  an  independent  response  surface  for  each  combination  of  categorical  variable 
settings,  resulting  in  escalating  computational  requirements. 

2.1.6  Other  Methods 

A  brief  mention  of  other  methods  used  for  stochastic  optimization  is  warranted,  par¬ 
ticularly  since  most  commercially  available  simulation  software  packages  that  offer  some 
optimization  functionality  do  not  use  the  methods  of  the  previous  sections.  Rather,  most 
packages  use  heuristic  search  procedures  [48,  Table  1]. 

A  search  heuristic  is  a  “technique  which  seeks  good  {i.e.  near-optimal)  solutions  at  a 
reasonable  computational  cost  without  being  able  to  guarantee  feasibility  or  optimality,  or 
even  in  many  cases  to  state  how  close  to  optimality  a  particular  feasible  solution  is  [112,  p. 
6]”.  Heuristics  typically  use  random  or  deterministic  sampling  as  a  tool  to  guide  exploitive 
search  techniques  that  are  more  efficient  than  pure  random  sampling  [54].  Examples  of 
search  heuristics  include  evolutionary  algorithms  (genetic  algorithms,  evolutionary  strate¬ 
gies  and  evolutionary  programming),  scatter  search,  tabu  search,  and  simulated  annealing. 
Most  heuristics  are  devised  with  mechanisms  to  enable  global  search  and  escape  local  min- 
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ima;  hence,  they  have  found  successful  application  for  large,  nonconvex,  and  combinatorial 
problems.  In  recent  years,  the  use  of  search  heuristics  for  stochastic  optimization  via  simu¬ 
lation  has  grown  rapidly,  evident  by  its  dominance  in  software.  This,  in  part,  is  a  reflection 
of  their  relative  ease  of  use  and  generality  (they  can  easily  be  adapted  to  mixed-variable 
problems  and  require  only  black-box  response  samples).  However,  their  application  to  sto¬ 
chastic  problems  has  been  largely  unmodified  from  their  original  form,  relying  on  inherent 
robustness  to  noise  rather  than  explicitly  accounting  for  noise  [48]. 

2.1.7  Summary  of  Methods 

Of  the  methods  presented  in  this  section,  stochastic  approximation  possesses  the  rich¬ 
est  convergence  theory.  However,  since  these  methods  represent  a  class  of  gradient-based 
methods,  they  are  geared  toward  problems  with  continuous  variables.  In  theory,  problems 
with  a  mix  of  integer  and  continuous  variables  (mixed-integer  problems)  could  be  addressed 
via  methods  that  iteratively  solve  subproblems  in  which  some  integrality  restrictions  are 
relaxed,  such  as  branch-and-bound.  However,  there  is  little  evidence  from  the  literature 
that  such  approaches  have  been  applied  in  conjunction  with  SA  and,  in  fact,  only  in  limited 
applications  has  SA  been  extended  to  problems  with  only  integer- valued  variables  [50,51]. 
Random  search  methods  are  the  most  general  methods  that  possess  some  convergence  re¬ 
sults,  but  a  general  convergence  theory  over  a  mixed-variable  domain  has  not  been  estab¬ 
lished.  Ranking  and  selection  procedures  inherently  provide  a  sense  of  convergence  via 
probability  guarantees,  but  if  applied  unmodified,  are  only  able  to  accommodate  a  small, 
discrete  set  of  designs.  Direct  search  methods  that  have  been  applied  to  stochastic  opti¬ 
mization  have,  thus  far,  not  considered  a  mixture  of  variable  types  nor  do  they  yet  possess 
a  general  convergence  theory.  Finally,  response  surface  methods,  while  useful  for  model¬ 
ing  and  analyzing  the  input /output  relationship  between  design  variables  and  responses. 
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do  not  possess  convergence  properties  and  apply,  in  general,  to  continuous  variables  only. 
However,  they  can  provide  a  useful  means  to  improve  more  rigorous  methods. 

This  review  illustrates  the  need  for  convergent  algorithms  that  can  treat  general, 
mixed-variable  optimization  problems  with  noisy  response  functions.  The  generalized  pat¬ 
tern  search  class  of  algorithms,  reviewed  in  the  following  section,  has  been  shown  to  be 
convergent  for  deterministic  optimization  problems  in  a  series  of  recent  results.  These 
methods  will  provide  the  basis  for  a  convergent  class  of  algorithms  in  this  research. 

2.2  Generalized  Pattern  Search 

In  recent  years,  research  in  direct  search  theory  has  led  to  several  results  for  the 
subclass  of  direct  search  algorithms  known  as  pattern  search.  This  section  describes  the 
various  pattern  search  approaches  found  in  the  literature,  beginning  with  the  unconstrained 
case  over  continuous  variables  and  followed  by  extensions  to  more  difficult  problem  settings. 

2.2.1  Pattern  Search  for  Continuous  Variables 

Upon  its  introduction  and  convergence  analysis,  Torczon  [143]  demonstrated  that  a 
generalized  class  of  pattern  search  methods  unifies  various  distinct  pattern  search  tech¬ 
niques;  namely,  the  Hooke-Jeeves  method,  coordinate  search  with  fixed  step  lengths,  EVOP 
with  factorial  designs  [33],  and  multidirectional  search  of  Dennis  and  Torczon  [38].  Torc- 
zon’s  paper  was  significant  in  that  it  established  a  global  convergence  theory  without  ever 
computing  or  explicitly  approximating  derivatives. 

Pattern  search  algorithms  are  defined  through  a  finite  set  of  directions  used  at  each 
iteration.  The  direction  set  and  a  step  length  parameter  are  used  to  construct  a  conceptual 
mesh  centered  about  the  current  iterate  (the  incumbent).  Trial  points  are  selected  from  this 
discrete  mesh,  evaluated,  and  compared  to  the  incumbent  in  order  to  select  the  next  iterate. 
If  an  improvement  is  found  among  the  trial  points,  the  iteration  is  declared  successful  and 
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the  mesh  is  retained  or  coarsened;  otherwise,  the  mesh  is  refined  and  a  new  set  of  trial 
points  is  constructed.  Torczon  proved  that,  for  a  continuously  differentiable  function  /,  a 
subsequence  of  the  iterates  {xk}  produced  by  the  generalized  class  of  methods  converges 
to  a  stationary  point  of  /  (i.e.,  liminffc^oo  \\'^f{xk)\\  =  0)  by  showing  that  the  mesh  size 
(step  length)  parameter  becomes  arbitrarily  small. 

The  mesh  is  defined  by  a  finite  set  of  directions  that  must  be  sufficiently  rich  to  ensure 
that  a  component  of  the  steepest  descent  direction  can  be  captured  by  at  least  one  element 
of  the  set  when  the  current  iterate  is  not  a  stationary  point.  Lewis  and  Torczon  [79] 
applied  the  theory  of  positive  linear  dependence  [37]  to  establish  criteria  for  a  core  set  of 
directions.  The  core  direction  set  must  be  drawn  from  a  set  that  positively  spans  the  space 
M"",  where  a  positive  spanning  set  of  directions  is  defined  as  one  in  which  nonnegative  linear 
combinations  of  all  directions  span  M”.  Typically  this  set  forms  a  positive  basis,  which  is  the 
smallest  proper  subset  of  a  positive  spanning  set  that  still  positively  spans  M”.  A  positive 
basis  contains  between  n  +  1  (a  minimal  set)  and  2n  (a  maximal  set)  elements;  therefore, 
the  worst  case  number  of  trial  points  per  iteration  can  be  bounded  to  n  +  1  points  by  an 
appropriately  constructed  direction  set. 

Lewis  and  Torczon  extend  the  results  of  [143]  and  [79]  to  problems  with  bound  con¬ 
straints  [80]  and  linear  constraints  [81].  In  these  situations,  the  set  of  search  directions 
must  be  sufficiently  rich  to  ensure  that  some  of  the  positive  spanning  directions  conform  to 
the  geometry  of  the  constraint  boundaries.  With  this  construct,  when  the  current  iterate 
is  not  a  constrained  stationary  point,  there  is  at  least  one  feasible  direction  of  descent  from 
which  to  choose. 

Audet  and  Dennis  [14]  present  an  alternative  but  equivalent  version  of  pattern  search 
and  attendant  convergence  theory  for  bound  and  linear  constrained  problems.  In  their 
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analysis,  various  convergence  results  are  reported  that  relate  the  optimality  conditions 
to  smoothness  properties  of  the  objective  function  and  to  the  defining  directions  of  the 
algorithm.  It  should  be  noted  that  Audet  and  Dennis  explicitly  separate  the  search  for 
an  improved  iterate  into  a  SEARCH  and  a  POLL  step.  The  optional  SEARCH  step  employs 
a  user-defined  strategy  to  seek  an  improved  mesh  point.  This  step  contributes  nothing  to 
the  convergence  theory,  but  allows  the  user  great  flexibility  to  apply  any  desired  heuristic 
to  speed  convergence.  For  example,  approaches  may  include  randomly  selecting  a  space¬ 
filling  set  of  points  using  Latin  hypercube  design  or  orthogonal  arrays,  or  applying  a  few 
iterations  of  a  genetic  algorithm.  For  computationally  expensive  functions,  one  common 
approach  is  to  use  previously  sampled  responses  to  construct  and  optimize  a  less  expensive 
surrogate  function  on  the  mesh  using  the  methods  of  Section  2.1.5.  Such  methods  have 
been  implemented  within  a  pattern  search  framework  without  sacrifice  to  the  convergence 
theory  [30-32, 39, 86, 127, 145, 147] . 

Audet  and  Dennis  [16]  extend  their  approach  to  nonlinear  constraints  by  implement¬ 
ing  a  filter  method  [45],  which  accepts  new  iterates  if  either  the  objective  function  or  an 
aggregate  constraint  violation  function  is  reduced.  In  an  alternative  approach  to  nonlinear 
constraint  handling,  Lewis  and  Torczon  [82]  use  an  augmented  Lagrangian  function  from 
Conn,  Gould,  and  Toint  [36]  to  construct  a  bound  constrained  subproblem  that  is  solved 
approximately  using  a  pattern  search.  Finally,  Audet  and  Dennis  [15]  recently  developed 
an  extension  to  GPS,  known  as  Mesh  Adaptive  Direct  Search  (MADS),  that  replaces  the 
filter  method  with  a  barrier  method  that  assigns  a  value  of  -|-oo  to  infeasible  iterates  with¬ 
out  evaluating  their  objective  function.  The  key  to  MADS  is  to  conduct  the  POLL  step 
using  a  dense  set  of  directions  that  enable  the  resulting  algorithms  to  retain  convergence 
properties  under  weak  constraint  qualifications  at  the  limit  point. 
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2.2.2  Pattern  Search  for  Mixed  Variables 

A  pattern  search  framework  for  MVP  problems  with  bound  and  linear  constraints  was 
developed  by  Audet  and  Dennis  [13]  by  incorporating  user-defined  discrete  neighborhoods 
into  the  definition  of  the  mesh.  The  methodology  was  further  generalized  in  [1]  and  [2]  to 
include  nonlinear  constraints.  In  the  mixed  variable  case,  the  POLL  step  is  conducted  by 
searching  a  subset  of  the  mesh  with  respect  to  the  continuous  variables  and  searching  a 
user-defined  discrete  neighbor  set.  If  the  POLL  step  does  not  yield  an  improved  solution,  an 
EXTENDED  POLL  step  is  initiated  in  the  continuous  neighborhood  of  any  discrete  neighbor 
with  an  objective  function  value  sufficiently  close  {i.e.  within  a  tolerance  to  that  of  the 
incumbent.  This  aspect  of  the  algorithm  allows  extension  of  the  convergence  theory  to  the 
mixed  variable  domain  but  incurs  a  cost  of  more  function  evaluations. 

2.2.3  Pattern  Search  for  Random  Response  Functions 

Pattern  search  applied  to  stochastic  optimization  problems  is  rare.  Ouali  et  al.  [104] 
applied  multiple  repetitions  of  generalized  pattern  search  directly  to  a  stochastic  simulation 
model  to  seek  minimum  cost  maintenance  policies  where  costs  were  estimated  by  the  model. 
In  a  more  rigorous  approach,  Trosset  [146]  analyzed  convergence  in  the  unconstrained, 
continuous  case  by  viewing  the  iterates  as  a  sequence  of  binary  ordering  decisions.  By 
defining  A^  =  f{Xk)—f{Y),  where  V  is  a  trial  point  from  the  mesh,  the  following  hypothesis 
test. 

Ho  ■  <  0  versus  Hi  :  Ak  >  0  (2.9) 

accepts  Y  as  the  new  iterate  if  the  null  hypothesis  is  rejected.  Such  a  test  is  subject  to  Type 
I  and  Type  H  errors.  A  Type  I  error  is  made  if  Hq  is  rejected  when  it  is  actually  true  and 
occurs  with  probability  a;  a  Type  II  error  is  made  if  Hq  is  accepted  when  Hi  is  true  and 
occurs  with  probability  /3.  A  selection  of  a  sequence  of  significance  levels  {ak}  such  that 
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oo 

^  afc  <  oo  ensures  (with  probability  one)  a  finite  number  of  Type  I  errors.  In  addition,  let 
k=l 

{Afc}  be  a  sequence  of  alternatives  satisfying  Afc  >  0,  Afc  =  o(Afc),  and  Afc  ^  0  that  require 
power  1  —  when  conducting  the  test  in  (2.9).  Choosing  a  sequence  {/3fc}  such  that 

OO 

^  <  oo  ensures  a  finite  number  of  Type  II  errors  when  A*,  >  A^.  Hence,  Trosset  claims 

k=l 

that  a  sequence  of  iterates  from  a  GPS  algorithm  can  be  shown  to  converge  almost  surely 
to  a  stationary  point  of  /  but,  in  practice,  would  require  a  very  large  number  of  samples 
to  guarantee  convergence  [146].  He  uses  a  power  analysis,  a  statistical  technique  designed 
to  determine  the  number  of  samples  required  to  guarantee  a  probability  1  —  /?  (known  as 
the  power  of  the  test)  of  rejecting  Hq  when  Hi  is  true,  to  show  that  the  number  of  samples 
per  iteration  grows  faster  than  the  squared  reciprocal  of  the  mesh  size  parameter. 

2. 3  Summary 

Section  2.1  reviewed  the  various  approaches  to  sampling-based  stochastic  optimization. 
Each  class  of  methods  was  discussed  with  regard  to  the  important  properties  of  convergence 
and  generality.  The  review  illustrates  the  need  for  rigorous  algorithms  that  can  treat  the 
target  class  of  problems  (1.2).  The  GPS  class  of  algorithms,  reviewed  in  Section  2.2, 
possesses  desirable  convergence  properties  for  deterministic  problems.  Due  to  its  reliance 
on  only  response  samples  (ie.,  no  derivatives)  and  as  its  applicability  over  mixed  variable 
domains,  GPS  also  possesses  desirable  generality  properties.  Extension  of  pattern  search 
theory  to  the  stochastic  setting  has  only  recently  been  introduced  [104, 146],  yet  has  not 
been  thoroughly  studied  to  yield  new  theoretical  or  empirical  results.  In  a  novel  approach 
that  combines  pattern  search  with  ranking  and  selection,  the  remaining  chapters  provide 
results  of  both  a  theoretical  and  empirical  nature. 
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Chapter  3  -  Algorithmic  Framework  and  Convergence 


Theory 

This  chapter  presents  the  algorithmic  framework  and  convergence  theory  for  mixed 
variable  stochastic  optimization  with  bound  and  linear  constraints  on  the  continuous  vari¬ 
ables  using  a  combined  generalized  pattern  search  with  ranking  and  selection  approach. 
Section  3.1  provides  some  basic  definitions  for  mixed  variable  domains.  Section  3.2  presents 
the  mathematical  framework  for  construction  of  the  mesh  from  which  candidate  solutions 
are  drawn.  Section  3.3  addresses  the  handling  of  bound  and  linear  constraints  within  this 
framework.  Section  3.4  summarizes  the  traditional  mixed-variable  GPS  approach  used  for 
deterministic  optimization.  Section  3.5  discusses  the  alternative  approach  to  iterate  selec¬ 
tion  in  the  presence  of  random  responses  by  selecting  from  among  a  number  of  candidates 
using  R&S.  Section  3.6  presents  and  describes  the  new  class  of  algorithms,  and  Section 
3.7  provides  a  theoretical  convergence  analysis  that  proves  almost  sure  convergence  to  an 
appropriately  defined  first-order  stationary  point.  Finally,  Section  3.8  illustrates  a  basic 
version  of  the  algorithm  on  a  simple  example. 

3. 1  Mixed  Variables 

In  order  to  devise  algorithms  for  the  target  class  of  problems,  it  is  important  to  have 
notions  of  local  optimality  and  stationarity  for  mixed  variable  domains.  Local  optimality  in 
a  continuous  domain  is  well  established,  and  even  for  discrete  variables,  it  is  not  difficult  to 
define.  However,  since  categorical  variables  typically  have  no  inherent  ordering,  the  concept 
of  a  local  neighborhood  must  be  defined  in  the  context  of  the  problem.  For  example,  in 
Kokkolaras  et  al.  [74]  the  optimization  problem  was  to  determine  the  optimal  number  and 
types  of  insulators  in  a  thermal  insulation  system.  Given  a  design,  a  discrete  neighbor  was 
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defined  to  be  any  design  in  which  an  insulator  was  replaced  with  one  of  a  different  material, 
or  one  in  which  the  number  of  insulators  was  increased  or  decreased  by  one. 

To  generalize  this  for  MVP  problems,  the  set  of  discrete  neighbors  is  defined  by  a  set¬ 
valued  function  TV  :  0  — >  2®,  where  2®  denotes  the  power  set  of  0.  The  notation  y  G  J\f{x) 
means  that  the  point  y  is  a  discrete  neighbor  of  x.  By  convention,  x  G  M{x)  for  each  x  G  0, 
and  it  is  assumed  that  M{x)  is  finite. 

Local  optimality  in  a  mixed  variable  domain  can  be  defined  in  terms  of  the  set  of 
discrete  neighbors.  The  following  definition  is  due  to  Audet  and  Dennis  [13]. 

Definition  3.1  (Local  Minimizer)  A  point  x  =  G  0  is  a  local  minimizer  of  /  with 

respect  to  the  set  of  neighbors  J\f{x)  C  0  if  there  exists  an  e  >  0  such  that  f{x)  <  f{v)  for 
all 


nG0n  U  {B{y^,e)xy'^). 


(3.1) 


y&N'ix) 


This  definition  is  stronger  than  simply  requiring  optimality  with  respect  to  the  contin¬ 
uous  variables  and  also  with  respect  to  discrete  neighbors.  It  requires  the  local  minimizer 
to  have  lower  function  value  than  any  point  in  a  neighborhood  of  each  discrete  neighbor. 
Furthermore,  the  quality  of  the  local  minimizer  is  impacted  by  the  user-defined  discrete 
neighborhood.  A  larger  set  of  discrete  neighbors  results  in  a  more  global  local  minimizer, 
but  algorithms  that  require  function  evaluations  at  each  discrete  neighbor  will  do  so  at 
a  greater  cost.  Since  optimization  algorithms  are  rarely  guaranteed  to  converge  to  local 
optimizers  in  general,  convergence  to  a  point  satisfying  certain  first-order  stationarity  con¬ 
ditions  is  a  good  substitute.  The  following  definition,  which  is  similar  in  form  to  that  of 
[84]  for  unconstrained  problems,  is  implied  but  not  formally  stated  in  [13]  and  [1].  The 
notation  represents  the  gradient  of  /  with  respect  to  the  continuous  variables  while 
holding  the  discrete  variables  constant. 

Definition  3.2  (First-order  necessary  conditions  in  mixed- variable  domain)  A  point  x  G  0 
satisfies  first-order  necessary  conditions  for  optimality  if 
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1.  >  0  for  any  feasible  {w‘^,x'^)  G  0; 

2.  f{x)  <  f{y)  for  any  discrete  neighbor  y  G  J\f{x)  C  0; 

3.  {w^  —  y^)^V^/(y)  >  0  for  any  discrete  neighbor  y  G  J\f{x)  satisfying  /(y)  =  f{x)  and 
for  any  feasible  {w^^,  y^)  G  0. 

The  converge  analysis  of  Section  3.7  shows  that,  under  reasonable  assumptions,  certain 
subsequences  generated  by  the  class  of  algorithms  introduced  in  this  chapter  converge  with 
probability  one  (almost  surely)  to  limit  points  satisfying  Conditions  1-3  of  Definition  3.2. 
However,  the  notions  of  convergence  and  continuity  in  a  mixed  variable  domain  are  first 
required.  The  following  two  definitions  appear  in  [1],  with  the  first  also  similar  to  one  in 


Definition  3.3  (Convergence,  limit  point)  Let  0  C  (M"^°  x  Z””^)  be  a  mixed  variable 
domain.  A  sequence  {xj}  G  0  is  said  to  converge  to  x  G  0  if,  for  every  e  >  0,  there  exists 
a  positive  integer  N  such  that  xf  =  x'^  and  ||xf  —  x'^H  <  e  for  alH  >  A^.  The  point  x  is  said 
to  be  the  limit  point  of  the  sequence  {xj}. 


Definition  3.4  (Neighbor  Set  Continuity)  A  set-valued  function  A/”  :  0  C  (M"-°  xZ"''*)  ^  2® 
is  continuous  at  x  G  0  if,  for  every  e  >  0,  there  exists  <?  >  0  such  that,  whenever  u  G  0 
satisfies  u'^  =  x*^  and  —  x'^||  <  g,  the  following  two  conditions  hold: 

1.  If  y  G  A/’(x),  there  exists  v  G  J\f{u)  satisfying  =  y'^  and  \\v^  —  y^\\  <  e. 

2.  If  X  G  M{u),  there  exists  y  G  7V’(x)  satisfying  y^  =  v'^  and  Hy'^  —  x'^||  <  e. 

Thus,  given  a  convergent  subsequence  of  iterates  and  a  convergent  subsequence  of  its 
discrete  neighbors,  continuous  Af  means  that  the  limit  points  of  the  discrete  neighbor  points 
are  themselves  discrete  neighbors  of  the  limit  point  of  the  iterates  [1]. 

3.2  Positive  Spanning  Sets  and  Mesh  Construetion 

Pattern  search  algorithms  produce  a  sequence  of  iterates  that  are  selected  from  a 
discrete  mesh  in  the  search  domain.  Construction  of  the  mesh  relies  on  the  following 
definitions,  due  to  Davis  [37] : 
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Definition  3.5  (Positive  combination)  A  positive  combination  of  the  set  of  vectors  V  = 

r 

{viYi=i  is  ^  linear  combination  ^  CiVi,  where  Cj  >  0,  i  =  1, 2, . . . ,  r. 

i=l 


Definition  3.6  (Positive  spanning  set,  positively  span)  A  finite  set  of  vectors  W  =  {wiY=i 
forms  a  positive  spanning  set  for  M”  if  every  v  G  M”  can  be  expressed  as  a  positive  combi¬ 
nation  of  vectors  in  W.  The  set  of  vectors  W  is  said  to  positively  span  M”. 


Definition  3.7  (Positive  basis)  A  positive  spanning  set  of  vectors  W  is  said  to  be  a  positive 
basis  for  M”  if  no  proper  subset  of  W  positively  spans  M”. 

The  motivation  for  using  positive  spanning  sets  in  GPS  algorithms  is  encompassed  in 
the  following  theorem,  due  to  Davis  [37]. 

Theorem  3.8  (Davis  [37]).  A  set  D  positively  spans  M”  if  and  only  if,  for  all  nonzero 
v  G  M”,  v'^d  >  0  for  some  d  ^  D. 

If  the  gradient  vector  V  f{x)  exists  at  x  and  is  nonzero,  then,  by  choosing  v  =  —Vf{x), 
there  exists  a  d  €  D  such  that  V f{xYd  <  0.  Thus,  at  least  one  element  of  D  is  a  descent 
direction  which  ensures  that  GPS  algorithms  can  always  find  improving  points  when  the 
gradient  is  nonzero. 

In  the  original  paper  on  pattern  search  for  continuous  variables,  Torczon  [143]  defined 
the  mesh  as  follows: 

Mk{xk)  =  {xk  +  Akd  :  d  eFk},  (3.2) 

where  is  the  mesh  size  parameter  at  iteration  k  and  d  G  Pfc  means  that  d  is  a  column 
of  the  direction  matrix  P^.  The  matrix  T^  may  be  decomposed  into  two  matrices, 

Fk=  [  Dk  Lk  ]  ,  (3.3) 

where  Dk  G  p  =  2n,  is  a  core  set  of  directions  and  Lk  G  g  >  1,  contains  at 

least  the  column  of  zeroes  and  any  additional  columns  of  directions  that  allow  algorithm 
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refinements.  Lewis  and  Torczon  [79]  redefined  the  requirements  for  Dk,  restricting  p  to  the 
range  n  +  1  <  ^  <  2n  and  ensuring  that  forms  a  positive  basis  according  to  Definition 
3.7.  Before  the  mesh  size  parameter  is  reduced,  each  mesh  point  defined  by  the  core  set  of 
directions,  i.e.  {x^  +  :  d  G  must  be  tried  and  declared  unsuccessful. 

Audet  and  Dennis  [13]  provide  an  alternative  but  equivalent  definition  for  continuous 
variables  using  the  following  mesh  construct, 

Mk{xk)  =  {xk  + AkDz  :  z  (3.4) 

where  represents  a  jUj-dimensional  vector  of  positive  integers.  The  directions  in  D 
form  a  positive  spanning  set  according  to  Definition  3.6  and  must  satisfy  the  restriction, 

D  =  GZ,  (3.5) 

where  G  G  is  a  nonsingular  generating  matrix  and  Z  G  Z”^I^L  One  or  more  points 

from  (3.4)  may  be  tried  for  improvement  during  an  optional  SEARCH  step  of  their  algorithm. 
If  the  step  does  not  discover  an  improved  solution,  the  POLL  step  is  invoked  in  which  points 
from  a  poll  set  defined  as, 

Pk{xk)  =  {xk  +  Akd  :  d  e  Dk  Q  D},  (3.6) 

are  tested  until  an  improved  solution  is  found  or  the  set  is  exhausted.  Note  that  Dk  is  also 
a  positive  spanning  set  and  Pk,  the  set  of  neighboring  mesh  points,  is  a  subset  of  Mk- 

For  MVP  problems  the  mesh  is  defined  differently,  but  in  a  way  that  reduces  to  the 
basic  mesh  structure  of  (3.4)  if  there  are  no  discrete  variables.  A  set  of  positive  spanning 
directions  D*  is  constructed  for  each  unique  combination  z  =  1,  2, . . . ,  imax,  of  values  that 
the  discrete  variables  may  take,  i.e., 

D*  =  GiZi,  (3.7) 
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where  Gi  G  is  a  nonsingular  generating  matrix  and  Zi  G  The  mild  restric¬ 

tions  imposed  by  (3.5)  and  (3.7)  are  necessary  for  the  convergence  theory.  The  mesh  is  then 
formed  as  the  direct  product  of  with  the  union  of  a  finite  number  of  meshes  in  Q^,  i.e., 

Mk(xk)  =  B'"  X  Q  {4  +  G  ©‘^  :  z  G  Z^’'}  .  (3.8) 

i=l  ^ 

At  iteration  k,  let  C  D*  denote  the  set  of  poll  directions  corresponding  to  the 
set  of  discrete  variable  values  and  define  The  poll  set  is  defined  with  respect 

to  the  continuous  variables  centered  at  the  incumbent  while  holding  the  discrete  variables 
constant.  Its  form  is 

Pk{xk)  =  {xk  +  Ak{d,0)  e  Q  :  d  e  Dl}  (3.9) 

for  some  1  <  i  <  imax,  where  (d,  0)  denotes  the  partitioning  into  continuous  and  discrete 
variables;  0  means  the  discrete  variables  remain  unchanged,  i.e.,  Xk  +  Afc((i,  0)  =  (x^  -|- 
Akd,xf). 

3.3  Bound  and  Linear  Constraint  Handling 

An  appropriate  means  to  search  regions  near  the  constraint  boundaries  is  necessary  to 
find  stationary  points  that  reside  there.  In  this  situation,  the  direction  set  is  required  to  be 
sufficiently  rich  so  that  the  polling  directions  of  the  GPS  algorithm  can  be  chosen  to  conform 
to  the  geometry  of  the  constraint  boundaries.  In  [80]  and  [81],  Lewis  and  Torczon  show  how 
this  can  be  done  for  every  point  in  ©  via  the  inclusion  of  generators  for  the  tangent  eone  to 
the  feasible  region  as  a  subset  of  directions  in  the  direction  set.  The  concepts  of  a  tangent 
vector  and  tangent  cone  are  formalized  in  the  following  definition,  taken  from  [100,  p.  587]. 

Definition  3.9  (Tangent,  tangent  cone)  A  vector  w  G  M”  is  tangent  to  ©  at  x  G  ©  if,  for 
all  vector  sequences  {xj}  with  x*  — *•  x  and  Xj  G  0,  and  all  positive  scalar  sequences  U  |  0, 
there  is  a  sequence  Wi  ^  w  such  that  Xj  -|-  tiWi  G  0  for  all  i.  The  tangent  eone  at  x  is  the 
collection  of  all  tangent  vectors  to  ©  at  x. 
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If  the  current  iterate  is  within  £  >  0  of  a  constraint  boundary,  the  tangent  cone 
K°{x,£)  may  be  generated  as  the  polar  of  the  cone  K{x,£)  of  outward  pointing  normals 
for  the  constraints  within  e  of  x*,.  This  is  illustrated  for  two  dimensions  in  Figure  3.1. 

Inclusion  of  the  tangent  cone  generators  in  the  set  of  directions  used  by  pattern  search 
is  sufficient  to  ensure  convergence.  An  algorithm  for  computing  these  directions  in  the 
absence  of  degeneracy  is  given  in  [81].  It  should  be  noted  that,  since  the  target  class  of 
problems  is  restricted  to  a  finite  number  of  linear  constraints,  there  are  only  a  finite  number 
of  tangent  cone  generators  for  the  entire  feasible  region,  which  prevents  violation  of  the 
finiteness  of  the  direction  sets,  D*,  z  =  1,  2, . . . ,  imax-  However,  this  would  not  hold  in  the 
presence  of  nonlinear  constraints,  which  are  not  treated  in  this  research. 


Figure  3.1.  Directions  that  conform  to  the  boundary  of  (from  [81]) 


To  simplify  the  convergence  analysis  in  Section  3.7  and  avoid  reintroducing  the  method 
of  Lewis  and  Torczon  [81],  the  following  more  general  definition  from  [14]  is  provided.  The 
construction  and  inclusion  of  tangent  cone  generators  will  be  assumed. 

Definition  3.10  (Conforming  directions)  Let  D  be  a  positive  spanning  set  in  i?".  A  rule 
for  selecting  the  positive  spanning  sets  =  D(/c,  Xk)  C  D  conforms  to  ©‘^  for  some  £  >  0, 
if,  at  each  iteration  k  and  for  each  y  in  the  boundary  of  ©‘^  for  which  ||y  —  Xfc||  <  £,  the 
tangent  cone  K°{x,£)  is  generated  by  nonnegative  linear  combinations  of  a  subset  of  the 
columns  of  Du- 
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With  conforming  directions  included,  linear  constraints  can  be  treated  with  the  simple 
barrier  approach.  That  is,  if  a  linear  constraint  is  violated  at  a  trial  point,  then  a  function 
value  of  +00  is  assigned  without  computing  the  objective  function  value  there,  thus  saving 
computational  expense. 

3.4  The  MG  PS  Algorithm  for  Deterministic  Optimization 

Within  the  GPS  framework,  mixed  variables  are  accommodated  via  a  user-defined  set 
of  discrete  neighbors  Af  introduced  in  Section  3.1  at  each  point  in  the  domain.  Elements 
in  the  neighbor  set  include  the  current  point  and  for  the  remaining  elements  involve,  at 
a  minimum,  changes  to  the  values  of  the  discrete  variables.  For  example,  if  the  discrete 
variables  are  integers,  a  neighborhood  structure  may  be  defined  by  holding  the  continuous 
variables  constant  and  allowing  a  maximum  change  of  one  unit  for  only  one  of  the  discrete 
variables,  ie.,  Af{xk)  =  {y^  =  G  <  1}.  This  may  not  be  appropriate 

if  the  discrete  variables  are  all  categorical  since  the  ordering  implied  by  integer  values  no 
longer  applies;  changing  a  categorical  variable  value  from  “1”  to  “3”  may  be  as  valid  as  a 
change  from  “1”  to  “2” .  Note  that  discrete  neighbors  may  require  accompanying  changes  to 
the  continuous  variables  in  order  for  the  solution  to  make  sense  for  the  particular  problem. 

The  basic  mixed-variable  GPS  (MGPS)  algorithm  for  deterministic  optimization  [13] 
conducts  three  distinct  searches  embodied  in  the  SEARCH,  POLL,  and  EXTENDED  POLL 
steps.  At  iteration  k,  the  optional  SEARCH  step  evaluates  points  from  a  subset  of  the  mesh, 
Sk  C  Mk{xk)  while  the  POLL  step  evaluates  points  from  the  set  Pk{xk)  and  the  set  of  discrete 
neighbors  Af{xk)-  Extended  polling  is  conducted  after  an  unsuccessful  SEARCH  and  POLL 
step  for  any  point  y  G  Af{xk)  in  the  discrete  neighbor  set  of  the  incumbent  that  satisfies 
/(y)  <  f{xk)  +  ^k-  The  term  is  the  extended  poll  trigger  at  iteration  k  and  must  satisfy 

^  >  0  for  some  positive  scalar  The  extended  poll  set  of  points  evaluated  about  a 
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particular  discrete  neighbor  yu  is  denoted  as  f  (y^)  =  .  Therefore,  a  poll  set 

with  respect  to  continuous  variables  is  constructed  about  and  the  resulting  finite  number 
of  extended  poll  points,  indexed  by  a  j  superscript,  are  evaluated  until  an  improvement  is 
found  over  f{xk)  or  no  further  improvement  can  be  made  in  the  continuous  variable  space 
near  y^.  In  either  case,  let  denote  the  total  number  of  extended  poll  points  considered 
in  the  EXTENDED  POLL  step  for  discrete  neighbor  y^.  The  point  Zk  =  is  termed  the 
extended  poll  endpoint.  The  set  of  all  extended  poll  points  considered  by  the  EXTENDED 
POLL  step  at  iteration  k  is  defined  as 


M£k)  =  U  (3.10) 

where  A/"!  =  {y  G  N'ixk)  :  f{xk)  <  f{y)  <  f{xk)  +  Cfc}- 

A  mixed- variable  GPS  (MGPS)  algorithm  for  deterministic  optimization,  due  to  Audet 
and  Dennis  [13],  is  shown  in  Figure  3.2.  With  deterministic  function  evaluations,  the 
algorithm  evaluates  trial  points  from  Sk  U  Pkixk)  U U  Xk{£-k)  in  search  of  an  improved 
mesh  point.  If  an  improved  point  is  found  in  any  step,  the  mesh  is  coarsened  or  retained; 
otherwise,  if  an  improved  point  is  not  found  from  the  set  Pk{xk)  UAf{xk)  U  Xk{£k)i  the 
mesh  is  refined. 

The  update  rules  for  in  the  algorithm  have  important  implications  for  the  con¬ 
vergence  analysis.  The  mesh  is  updated  (refined,  coarsened,  or  retained)  according  to  the 
rules  found  in  [1,  p.  46].  Refinement  must  satisfy 


^k+l  — 


=  T 


Ai 


(3.11) 


where  r  >  1  is  rational  and  fixed  over  all  iterations,  0  <  r™'''  <  1,  and  is  an  integer 
satisfying  mmin  <  <  —  1  for  some  fixed  integer  TUmin  <  —  1- 
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Mixed- Variable  Generalized  Pattern  Search  (MGPS)  Algorithm 
Initialization:  Choose  a  feasible  starting  point  xq  G  0.  Set  Aq  >  0  and  ^  >  0. 

Set  the  iteration  counter  k  to  0.  For  A:  =  0,  1)  2,  . . perform  the  following 

1.  Set  extended  poll  trigger 

2.  Search  step  (optional):  Employ  a  finite  strategy  seeking  an  improved  mesh  point;  i.e.,  Xk+i  G 
Mk{xk)  such  that  f{xk+i)  <  /(xfe). 

3.  Poll  step:  If  the  search  step  did  not  find  an  improved  mesh  point,  evaluate  /  at  points  in 
Pk{xk)  k)M{xk)  until  either  an  improved  mesh  point  Xk+i  is  found  or  until  the  set  Pk{xk)  ^M{xk) 
is  exhausted. 

4.  Extended  Poll  step:  If  search  and  poll  did  not  find  improved  mesh  point,  evaluate  /  at 
points  in  Xk{^k)  until  either  an  improved  mesh  point  Xk+i  is  found  or  Xk{^k)  is  exhausted. 

5.  Parameter  update:  If  search,  poll,  or  extended  poll  finds  an  improved  mesh  point,  update 
Xk+i  and  set  A^+i  >  A^;  otherwise,  set  Xk+i  =  Xk  and  A^+i  <  A^. 

Figure  3.2.  MGPS  Algorithm  for  Deterministic  Optimization  (adapted  from  [1]) 

Goarsening  after  a  successful  SEARCH,  POLL,  or  EXTENDED  POLL  step  is  accomplished 


by 

Afc+i=r<Afc  (3.12) 

where  r  >  1  is  defined  as  above  and  is  an  integer  satisfying  0  <  <  mmax  for  some 

fixed  integer  rumax  >  0. 

From  these  rules,  it  follows  that  the  mesh  size  parameter  at  iteration  k  may  be  ex¬ 
pressed  in  terms  of  the  initial  mesh  size  parameter  value,  i.e., 

Afc  =  r^-'Ac  (3.13) 

for  some  bk  G  Z,  which  provides  for  an  orderly  algebraic  structure  of  the  iterates  important 
to  proving  convergence  without  imposing  a  sufficient  decrease  requirement  [143]. 
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3.5  Iterate  Selection  for  Noisy  Response  Functions 


For  problems  with  noisy  response  functions,  single-sample  response  comparisons  of 
the  type  used  in  the  algorithm  of  Figure  3.2  can  potentially  lead  to  erroneous  decisions 
due  to  variation  in  the  response.  Alternative  techniques  for  comparing  trial  points  are 
necessary  to  ensure  that  the  iterate  selection  decision  accounts  for  variation  and  provides 
some  statistical  assurances  of  correct  decisions.  In  the  approach  of  Trosset  [146],  iterate 
selection  via  hypothesis  testing  is  suggested  in  which  a  binary  selection  decision  between  the 
incumbent  and  candidate  design  is  based  on  sufficient  statistical  evidence.  This  approach 
is  generalized  in  this  research  by  using  R&S  so  that  multiple  candidates  may  be  considered 
simultaneously  at  reasonable  computational  cost  associated  with  the  requisite  sampling. 
This  approach  provides  the  following  advantages: 

•  It  is  amenable  to  parallelization  techniques  since  several  trial  solutions  can 
be  considered  simultaneously  in  the  selection  process  rather  than  only  two 
(incumbent  and  candidate). 

•  R&S  procedures  detect  the  relative  order,  rather  than  generate  precise  estimates, 
of  the  candidate  solutions.  This  is  generally  easier  to  do  [48]  and  provides 
computational  advantages. 

•  Selection  error  is  limited  to  Type  II  error  only,  i.e.,  making  an  incorrect  selection 
of  the  best  candidate;  Type  I  error  is  eliminated  based  on  the  assumption  of  a 
best  system  among  the  candidates. 

•  The  use  of  an  indifference  zone  parameter  (defined  in  Section  2.1.3)  can  be  easily 
and  efficiently  adapted  for  algorithm  termination. 


The  mechanics  of  a  general  indifference-zone  R&S  procedures  are  developed  in  this 
section  so  that  this  construct  may  be  incorporated  into  the  generalized  pattern  search  algo¬ 
rithm  (Section  3.6) .  At  iteration  k  of  the  algorithm,  consider  a  finite  set  C  =  {Ti ,  >2 ,  •  •  • ,  Ync  } 
C  Mk  of  candidate  solutions,  including  the  incumbent,  such  that  nc  >  2.  For  each 
q  =  1,2,...,  nc,  let  fq  =  fiYq)  =  E[F{Yq,  •)]  denote  the  true  mean  of  the  response  func¬ 
tion  F.  As  in  Section  2.1.3,  the  collection  of  these  means  can  be  ordered  from  minimum  to 
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maximum  as 


/[i]  <  f[2]  <  ■  ■  ■  <  f[nc]-  (3-14) 

Again,  the  notation  G  C  indicates  the  candidate  from  C  with  the  gth  best  (lowest)  true 
objective  function  value  and  the  probability  of  correct  selection  is  defined  as 

P{CS}  =  P  {select  Yfi]  |  /[^j  -  /p]  >S,q  =  2,...,  nc}  >  I  -  a,  (3.15) 

where  S  and  a  become  parameters  in  the  algorithm. 

Of  course,  true  objective  function  values  are  not  available  in  the  current  problem 
setting,  so  it  is  necessary  to  work  with  sample  means  of  the  response  F.  For  each  q  = 
1,2,...,  nc,  let  Sq  be  the  total  number  of  replications  and  let  be  the  set  of  responses 

obtained  via  simulation,  where  Fqs  =  F{Yqs),  s  =  1, . . . ,  Sq.  Then  for  each  q  =  1,2, . . . ,  nc, 
the  sample  mean  Fq  is  computed  as 

=  (3.16) 

1 

^  s=l 

These  sample  means  may  be  ordered  and  indexed  the  same  way  as  in  (3.14).  The  notation 
G  C  is  used  to  denote  the  candidate  with  the  (^th  best  (lowest)  estimated  objective 
function  value  as  determined  by  the  R&S  procedure.  The  candidate  corresponding  to  the 
minimum  mean  response,  l)i]  =  arg(F^[i]),  is  selected  as  the  new  iterate. 

To  retain  generality  of  the  algorithm  class  of  Section  3.6,  Procedure  RS(C',  a,  (5)  is 
defined  in  Figure  3.3  as  a  generic  R&S  procedure  that  takes  as  input  a  candidate  set 
C  C  Mfc,  significance  level  a,  and  indifference  zone  parameter  (5,  and  returns  candidate 
l{i]  =  arg(F^-i])  as  the  best.  The  technique  used  in  Step  1  to  determine  the  number  of 
samples  for  each  candidate  is  dependent  on  the  specific  procedure.  Three  specific  techniques 
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Procedure  RS(C',  a,  S) 

Inputs:  A  set  C  =  {Pi,  1^2,  •  •  • ,  Pncl  of  candidate  solutions,  significance  level  a,  and  indifference 
zone  parameter  S. 

Step  1:  For  each  candidate  Yq,  use  an  appropriate  technique  to  determine  the  number  of  samples 
Sq  required  to  meet  the  probability  of  correct  selection  guarantee,  as  a  function  of  a,  5  and  response 
variation  of  Yq . 

Step  2\  Obtain  sampled  responses  Fqg,  q  =  1,. . .  ,nc  and  s  =  1, . . . ,  s^.  Calculate  the  sample  means 
Fq  based  on  the  Sq  replications  according  to  (3.16).  Select  the  candidate  associated  with  the  smallest 
estimated  sample  mean,  i.e.,  Y[i]  =  arg  (F’[i])  as  having  the  5-near-best  mean. 

Return:  Pp] 

Figure  3.3.  A  Generic  R&S  Procedure 

were  implemented  for  the  computational  evaluation  and  are  described  in  detail  in  Section 

4.1. 

3. 6  The  MGPS-RS  Algorithm  for  Stochastic  Optimization 

For  stochastic  response  functions,  procedures  of  the  type  introduced  in  Section  3.5 
are  used  within  the  generalized  pattern  search  framework  to  select  new  iterates.  This 
framework  is  flexible  in  that  a  number  of  specific  R&S  procedures  may  be  used,  so  long  as 
they  satisfy  the  probability  of  correct  selection  guarantee  (3.15). 

A  mixed  variable  GPS  ranking  and  selection  (MGPS-RS)  algorithm  is  presented  in 
Figure  3.4  for  mixed  variable  stochastic  optimization  problems  with  bound  and  linear  con¬ 
straints  on  the  continuous  variables.  In  the  algorithm,  binary  comparisons  of  incumbent 
and  trial  designs  used  in  traditional  GPS  methods  are  replaced  by  R&S  procedures  in 
which  one  candidate  is  selected  from  a  finite  set  of  candidates  considered  simultaneously. 
The  R&S  procedures  provide  error  control  by  ensuring  sufficient  sampling  of  the  candidates 
so  that  the  best  or  5-near-best  is  chosen  with  probability  1  —  a  or  greater. 

The  mesh  construct  of  (3.8)  defines  the  set  of  points  in  the  search  domain  0  from 
which  the  candidates  are  drawn.  In  the  SEARCH  step,  the  flexibility  of  GPS  allows  any 
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Mixed  Variable  Generalized  Pattern  Search  -  Ranking  h  Selection  (MGPS-RS) 

Algorithm 

Initialization:  Set  the  iteration  counter  k  to  0.  Set  the  R&S  counter  r  to  0.  Choose  a  feasible 
starting  point,  Xq  g  0.  Set  Aq  >  0,  ^  >  0,  ao  G  (0, 1),  and  So  >  0. 

1.  Search  step  (optional):  Employ  a  finite  strategy  to  select  a  subset  of  candidate 
solutions,  Sk  C  Mk(X/c)  defined  in  (3.8)  for  evaluation.  Use  Procedure  RS(Sk  U  {X^}, 
ar,  Sr)  to  return  the  estimated  best  solution  Y^ij  G  5^  U  {X^}.  Update  a^+i  <  ctr, 
5r+i  <  Sr,  and  r  =  r  +  1.  If  Yjij  7^  Xk,  the  step  is  successful,  update  Xk+i  =  Y[i], 

>  Afc  according  to  (3.12),  and  k  =  k  +  1  and  repeat  Step  1.  Otherwise,  proceed  to 
Step  2. 

2.  Poll  step:  Set  extended  poll  trigger  Use  Procedure  RS{Pk{Xk)  yjN{Xk), 

ar,  Sr)  where  Pk{Xk)  is  defined  in  (3.9)  to  return  the  estimated  best  solution  G 
Pk{Xk)  Uj\f{Xk).  Update  a^+i  <  Or,  (^r+i  <  Sr,  and  r  =  r  +  1.  If  P)i]  7^  Xk,  the  step 
is  successful,  update  Xk+i  =  Y^i],  >  A^  according  to  (3.12),  and  k  =  k  +  1  and 

return  to  Step  1.  Otherwise,  proceed  to  Step  3. 

3.  Extended  poll  step:  For  each  discrete  neighbor  Y  G  M{Xk)  that  satisfies  the 
extended  poll  trigger  condition  F{Y)  <  F{Xk)  +  fk^  set  j  =  1  and  Y^  =Y  and  do  the 
following. 

a.  Use  Procedure  RS(Pfc(V^),  Ur,  Sr)  to  return  the  estimated  best  solution  G 
Pk{Yj!).  Update  a^+i  <  ar,  (5r+i  <  Sr,  and  r  =  r  + 1.  If  Y[i]  7^  Y^,  set  =  P)i] 
and  j  =  j  +  1  and  repeat  Step  3a.  Otherwise,  set  Zk  =  Yj!  and  proceed  to  Step 
3b. 

b.  Use  Procedure  RS(Afc  U  Zk)  to  return  the  estimated  best  solution  Y[i]  =  Xk  or 
P)i]  =  Zk-  Update  ar+i  <  Cir,  <5r+i  <  Sr,  and  r  =  r  +  1.  If  Up]  =  Zk,  the  step 
is  successful,  update  Xk+i  =  A^+i  >  A^  according  to  (3.12),  and  k  =  k  +  1 
and  return  to  Step  1.  Otherwise,  repeat  Step  3  for  another  discrete  neighbor  that 
satisfies  the  extended  poll  trigger  condition.  If  no  such  discrete  neighbors  remain, 
set  Xk+i  =  Xk,  Afc+i  <  Afc  according  to  (3.11),  and  k  =  k-\-l  and  return  to  Step 
1. 


Figure  3.4.  MGPS-RS  Algorithm  for  Stochastic  Optimization 

user-defined  procedure  to  be  used  in  determining  which  candidates  from  (3.8)  to  consider. 
In  the  POLL  step,  the  entire  poll  set  about  the  incumbent  (3.9)  and  the  discrete  neighbor  set 
are  considered  simultaneously.  If  SEARCH  and  POLL  are  unsuccessful,  the  EXTENDED  POLL 
step  conducts  a  polling  sequence  that  searches  the  continuous  neighborhood  of  any  discrete 
neighbor  with  a  response  mean  sufficiently  close  to  the  response  mean  of  the  incumbent. 
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This  step  is  divided  into  sub-steps  to  account  for  the  sequence  of  R&S  procedures  that 
may  be  necessary.  In  Step  3a,  each  sub-iterate  ,  indexed  by  sub-iteration  counter  j  and 
iteration  k,  is  selected  as  the  best  candidate  from  the  poll  set  centered  about  the  previous 
sub-iterate  using  the  R&S  procedure,  terminating  when  the  procedure  fails  to  produce 
a  sub-iterate  different  from  its  predecessor.  The  terminal  point  of  the  resulting  sequence 

}/=i,  denoted  as  and  termed  an  extended  poll  endpoint,  is  compared  to  the 

incumbent  via  a  separate  R&S  procedure  in  Step  3b. 

If  the  extended  poll  trigger  is  set  too  high,  more  extended  poll  steps  result,  thus 
making  a  solution  more  “global” .  However,  the  additional  sampling  required  at  the  extra 
points  increases  computational  expense,  particularly  with  high  noise  levels  in  the  response 
output. 

The  algorithm  maintains  a  separate  counter  for  R&S  parameters  and  Sr  to  provide 
strict  enforcement  of  the  rules  on  these  parameters  that  are  updated  after  each  execution 
of  the  R&S  procedure.  The  rules  ensure  that  each  parameter  tends  to  zero  as  the  number 
of  iterations  approaches  infinity.  An  additional  restriction  on  ar  is  that  the  infinite  series 
Or  converges;  that  is,  “r  <  oo-  These  restrictions  are  critical  for  convergence 

and  are  justified  in  Section  3.7. 

The  update  rules  for  in  the  algorithm  are  the  same  as  the  deterministic  case. 
Refinement  (3.11)  is  accomplished  after  SEARCH  (if  used),  POLL,  and  EXTENDED  POLL  are 
all  unsuccessful.  Coarsening  (3.12)  is  accomplished  after  any  successful  SEARCH,  POLL,  or 
EXTENDED  POLL  step. 

Each  execution  of  the  R&S  procedure  generates  an  iterate  or  sub-iterate  that  is  the 
candidate  returned  as  the  best  by  the  procedure.  When  the  new  iterate  (sub-iterate)  is 
different  from  (presumed  better  than)  the  incumbent,  the  iteration  (sub-iteration)  is  termed 
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successful]  if  it  remains  the  same,  it  is  unsuccessful.  The  use  of  these  terms  is  in  keeping  with 
traditional  pattern  search  methods  where,  in  a  deterministic  setting,  a  success  indicates  a 
strict  improvement  in  the  objective  function  value.  Let  denote  an  iterate  or  sub-iterate 
selected  from  candidate  set  C  of  cardinality  nc  by  the  rth  R&S  procedure  of  the  MGPS- 
RS  algorithm.  Each  successful  and  unsuccessful  outcome  (iteration  or  sub-iteration)  can 

then  be  further  divided  into  three  cases.  These  cases  follow: 

1.  The  outcome  is  considered  successful  if  one  of  the  following  holds: 

a.  indifference  zone  condition  is  met  and  R&S  correctly  selects  a  new  incumbent, 
i.e., 

Vr  ^  Vr+i  =  y[i],  /(![,])  -  /(![!])  >Sr,q  =  2,3,...,nc  ]  (3.17) 

b.  indifference  zone  condition  is  met  but  R&S  incorrectly  selects  a  new  incumbent, 

he., 

14  4  v;+i  ^  T[1],  /(![,])  -  /(![!])  >Sr,q  =  2,3,...,nc  ]  (3.18) 

c.  indifference  zone  condition  is  not  met  and  R&S  selects  a  new  incumbent,  i.e., 

Vr  7^  Vr+i,  \f{Y[q])  -  /(T[i])|  <  Sr  for  some  g  G  {2,3, . .  .,nc}  .  (3.19) 

2.  The  outcome  is  unsuccessful  if  one  of  the  following  holds: 

a.  indifference  zone  condition  is  met  and  R&S  correctly  selects  the  incumbent,  i.e., 

Vr  =  Vr+i  =  y[i],  /(![,])  -  /(![!])  >  Sr,  q  =  2,3,..., nc  ]  (3.20) 

b.  indifference  zone  condition  is  met  but  R&S  incorrectly  selects  the  incumbent,  i.e., 

Vr  =  Vr+i  ^  y[i],  /(![,])  -  /(T[1])  >  Sr,  q  =  2,3,..., nc  ]  (3.21) 

c.  indifference  zone  condition  not  met  and  R&S  selects  the  incumbent,  i.e., 

14+1  =  14,  |/(T[g])  -  /(T[1])|  <  for  some  q  e  {2,3, ...  ,nc}  .  (3.22) 

In  the  algorithm,  and  play  the  role  of  14  for  iterates  and  sub-iterates,  respec¬ 
tively.  Of  the  possible  outcomes  for  new  iterates  or  sub-iterates,  conditions  (3.17)  and  (3.20) 
conform  to  the  traditional  GPS  methods  for  deterministic  optimization  where,  in  the  case 
of  a  successful  iteration,  a  trial  point  on  the  mesh  has  a  better  true  objective  function  value 
than  the  incumbent  and,  in  the  case  of  an  unsuccessful  iteration,  the  incumbent  has  the 
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best  true  objective  function  value  of  all  candidates  considered.  Of  particular  concern  for 
the  convergence  analysis  are  the  remaining  conditions. 

Conditions  (3.19)  and  (3.22)  occur  when  the  difference  between  true  objective  function 
values  of  a  trial  point  on  the  mesh  and  the  incumbent  is  smaller  than  the  indifference  zone 
parameter.  This  situation  can  result  from  either  an  overly  relaxed  indifference  zone  or  a 
flat  surface  of  the  true  objective  function  in  the  region  of  the  search.  When  this  occurs, 
the  probability  for  correct  selection  cannot  be  guaranteed.  However,  forcing  convergence  of 
Sr  to  zero  via  update  rules  ensures  that  the  indifference  zone  condition  will  be  met  in  the 
limit.  Of  greater  concern  is  the  case  when  the  indifference  zone  condition  is  met,  but  the 
algorithm  selects  the  wrong  candidate  (i.e.,  it  doesn’t  choose  the  candidate  with  the  best 
true  objective  function  value).  This  represents  conditions  (3.18)  and  (3.21),  and  occurs 
with  probability  or  less  for  the  rth  R&S  procedure.  The  convergence  analysis  of  the 
following  section  addresses  controls  placed  on  the  errors  presented  by  these  conditions. 

3.7  Convergence  Analysis 

In  this  section,  a  convergence  analysis  for  the  MGPS-RS  algorithm  is  given.  The 
following  assumptions  are  required  for  the  analysis: 

Al:  All  iterates  produced  by  the  MGPS-RS  algorithm  he  in  a  compact  set. 

A2:  The  objective  function  /  is  continuously  differentiable  with  respect  to  the  continuous 
variables  when  the  discrete  variables  are  fixed. 

A3:  For  each  set  of  discrete  variables  the  corresponding  set  of  directions  H*  =  GiZi, 
as  defined  in  (3.7),  includes  tangent  cone  generators  for  every  point  in  O'”. 

A4:  The  rule  for  selecting  directions  conforms  to  0'’’  for  some  e  >  0  (see  Definition 
3.10). 
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A5:  For  each  q  =  1,2,  ...,nc,  the  responses  are  independent,  identically  and 

normally  distributed  random  variables  with  mean  f{Xq)  and  unknown  variance  <  oo, 
where  whenever  i  ^  q. 

A6:  The  sequence  of  significance  levels  {ar}  satisfies  <  00,  and  the  sequence  of 

indifference  zone  parameters  {(^r}  satisfies  hmj._>oo  =  0. 

A7:  For  the  rth  R&S  procedure  considering  candidate  set  C  =  {Ti,  >2,  •  •  • ,  d^cl)  Procedure 
RS(C,  ar,  Sr)  guarantees  correctly  selecting  the  best  candidate  Rji]  G  C  with  probability 
of  at  least  1  —  ar  whenever  /(T[q])  —  /(P[i])  >  Sr  for  any  g  G  {2,  3, . . . ,  nc}- 
A8:  For  all  but  a  finite  number  of  MGPS-RS  iterations  and  sub-iterations,  the  best  so¬ 
lution  Fji]  G  C  is  unique;  i.e.,  /(F[i])  7^  f0^[q])  ^  ^  {2, 3, . . . ,  nc}  where  C  = 

{Yi,  >2,  •  •  • ,  Ync}  C  M{Xk)  at  iteration  k. 

These  assumptions  warrant  a  brief  discussion.  Assumption  A1  is  a  fairly  standard 
assumption,  and  is  easily  enforced  by  including  finite  upper  and  lower  bounds  on  the  con¬ 
tinuous  variables,  which  is  very  common  in  practice.  Assumption  A3  ensures  that  the 
restriction  on  the  direction  set  (3.7)  is  maintained  in  the  presence  of  linear  constraints,  and 
assumption  A4  provides  for  adequate  rules  to  generate  conforming  directions.  A  sufficient 
condition  for  assumption  A3  to  hold  is  that  Gi  =  I  for  each  i  G  {1, . . . ,  imax}  and  the  coef¬ 
ficient  matrix  A  is  rational  [1,  p.  73].  The  independent,  normally  distributed  requirement 
for  responses  from  a  single  alternative  in  assumption  A5  is  common  for  R&S  techniques 
and  is  readily  achieved  in  simulation  via  batched  output  data  or  sample  averages  of  inde¬ 
pendent  replications  [99] .  Furthermore,  unequal  variances  between  different  alternatives  is 
realistic  for  practical  problems  and  is  readily  handled  with  modern  R&S  procedures.  As¬ 
sumption  A6  is  a  requirement  levied  to  enable  the  convergence  proofs  in  this  section.  As¬ 
sumption  A7  provides  the  correct  selection  guarantee  of  the  R&S  procedure  and  is  required 


58 


in  the  absence  of  identifying  a  specific  method.  Most  R&S  procedures  are  accompanied  by 
proofs  that  the  correct  selection  guarantee  is  met.  MGPS-RS  is  flexible  in  that  any  R&S 
procedure  may  be  used,  so  long  as  it  satisfies  assumption  A7.  Finally,  assumption  A8  is 
required  to  ensure  that  the  indifference  zone  condition  is  eventually  met  during  the  course 
of  the  iteration  sequence.  This  assumption  may  seem  restrictive,  but  the  likelihood  of  two 
candidate  mesh  points  having  exactly  the  same  objective  function  value  is  quite  rare  for 
non-academic  problems. 

Since  MGPS-RS  iterates  are  random  variables,  the  convergence  analysis  must  be  car¬ 
ried  out  in  probabilistic  terms.  To  that  end,  the  following  definition  provides  what  is  needed 
for  iterates  in  a  mixed  variable  domain,  and  is  consistent  with  Definition  3.3. 

Definition  3.11  (Almost  Sure  Gonvergence,  Limit  Point)  Let  0  C  be  a  mixed 

variable  domain.  A  sequence  of  multivariate  random  vectors  {X^}  converges  almost  surely 
(o.s.)  to  the  limit  point  a;  G  0  if,  for  every  e  >  0,  there  exists  a  positive  integer  N  such 
that  P{X^  =  x^)  =  l  and  P  (||  <  e)  =  1  for  all  k>  N. 

3. 7. 1  Controlling  Incorrect  Selections 

Random  variation  in  the  responses  leads  to  errors  in  the  iterate  selection  decision  in 
the  form  of  incorrect  selections.  The  concept  of  an  incorrectly  selected  MGPS-RS  iterate 
or  sub-iterate  was  formalized  by  conditions  (3.18)  and  (3.21).  Let  Ar  denote  the  incorrect 
selection  event  that  “14+i  is  incorrectly  selected  by  the  rth  R&S  procedure  in  the  MGPS- 
RS  algorithm”.  For  convergence  of  the  iteration  sequence  {A^},  a  means  of  bounding  the 
sequence  of  incorrect  selection  events  {Ar}  is  necessary  so  that  the  sequence  of  iterates  is  not 
dominated  by  incorrectly  selected  (and  possibly  unimproving)  candidates.  The  restriction 
on  the  sequence  of  significance  levels  in  assumption  A6,  along  with  the  first  half  of  the 
Borel-Gantelli  lemma,  provide  this  means. 
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Lemma  3.12  (Borel-Cantelli)  Let  {Br}  be  an  infinite  sequenee  of  random  events.  If 

OO 

^P{Br)  <  OO, 
i=l 

then 

P{Br  i.o.)  =  0. 

The  term  “io.”  stands  for  infinitely  often,  so  that  the  event  [Br  i.o]  can  be  interpreted 
as  the  event  ^^Br  happens  for  infinitely  many  values  of  r”.  Note  that  there  is  no  requirement 
for  the  events  Br  to  be  independent  or  identically  distributed.  A  proof  of  this  lemma  can 
be  found  in  [113,  p.  102], 

Lemma  3.13  With  probability  1,  the  subsequenee  of  ineorreetly  seleeted  iterates  and  sub¬ 
iterates  generated  by  algorithm  MGPS-RS  is  finite. 

Proof.  Let  Ar  denote  the  occurrence  of  the  event  that  the  rth  R&S  procedure  incorrectly 
selects  the  next  iterate  or  sub-iterate.  The  complement  of  assumption  A7  yields  P{Ar)  < 

OO  OO 

ar,  r  =  1,2,...,  and  assumption  A6  ensures  that  <  X]  The  result 

r=l  r=l 

follows  directly  from  Lemma  3.12.  ■ 

The  restriction  on  ar  can  be  enforced  in  practice  through  an  appropriately  selected 
update  rule.  For  example,  the  update  rule  ar  =  aop^  for  0  <  p  <  1  and  oq  >  0  results  in  a 
geometric  series  that  converges,  since  <  oo.  Using  this  rule  with  oq  <  1)  as 

required  by  the  R&S  procedure,  the  rate  at  which  ar  converges  to  zero  can  be  controlled 
by  the  parameter  p.  Values  for  p  closer  to  zero  result  in  faster  convergence  than  those  that 
are  closer  to  one. 

A  final  consideration  involving  incorrect  selections  is  required  to  enable  analysis  of 
the  mesh  size  parameter.  In  particular,  it  is  necessary  to  establish  that  MGPS-RS  cannot 
cycle  indefinitely  among  iterates  that  belong  to  0.  Such  a  condition  occurs  if  and  only  if  it 
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is  possible  to  have  infinitely  many  consecutive  successful  iterations.  The  following  lemma 
establishes  the  result. 

Lemma  3.14  With  probability  1,  the  number  of  eonseeutive  sueeessful  MGPS-RS  iterations 
must  be  finite. 

Proof.  Let  Ks  represent  the  number  of  successful  iterations  of  MGPS-RS  after  iteration  k. 
From  conditions  (3.17)-(3.19),  Ks  =  Ks  +  Kc  +  Kj  where  K5  is  the  number  of  successful  it¬ 
erates  until  the  indifference  zone  condition  is  satisfied  (3.17),  Ks  is  the  number  of  correctly 
selected  successful  iterates  (3.18),  and  Kj  is  the  number  of  incorrectly  selected  successful 
iterates  (3.19).  Assumptions  A6  and  A8  ensure  that  Kg  <  00.  Furthermore,  since  assump¬ 
tion  A1  ensures  that  all  iterates  lie  in  a  compact  set,  it  must  follow  that  Kc  <  00.  Finally, 
since  the  number  of  incorrectly  selected  successful  iterates  is  a  subset  of  all  incorrect  selec¬ 
tions  (successful  and  unsuccessful).  Lemma  3.13  ensures  that  P{Ki  <  00)  =  1.  It  follows 
that 

P{Ks  <  00)  =  P{Kg  -\-  Kc  +  Ki  <  00)  =  P{Ki  <  00)  =  1  .  ■ 

3.7.2  Mesh  Size  Behavior 

The  main  result  of  this  section  is  that,  with  probability  one,  there  exists  a  subsequence 
of  mesh  size  parameters  that  goes  to  zero,  i.e.  P(liminfAfc  =  0)  =  1,  which  is  independent 

k^fi-oo 

of  any  smoothness  assumptions  on  the  objective  function.  This  result  was  first  established 
by  Torczon  [143]  and  subsequently  modified  for  MVP  problems  by  Audet  and  Dennis  [13]. 
Audet  and  Dennis  later  adapted  a  lemma  of  Torczon  [143]  to  provide  a  lower  bound  on  the 
distance  between  any  two  mesh  points  at  each  iteration  for  continuous-variable  problems 
[14],  which  was  then  extended  by  Abramson  [1]  to  MVP  problems.  This  lower  bound  is 
stated  in  Lemma  3.15,  the  result  of  which  is  necessary  to  show  that  the  mesh  size  parameter 
is  bounded  above  in  Lemma  3.16.  Finally,  Theorem  3.17  presents  the  key  result  for  this 
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section.  The  proof  of  Lemma  3.15  is  independent  of  response  noise  but  is  included,  as  found 
in  [1],  for  completeness.  The  proofs  for  Lemma  3.16  and  Theorem  3.17  are  modified  from 
[1]  to  account  for  stochastic  responses. 


Lemma  3.15  For  any  k  >  0,  k  &  Z,  let  u  and  v  be  any  pair  of  distinet  mesh  points  sueh 
that  vf  =  Then  for  any  norm  for  whieh  all  nonzero  integer  veetors  have  norm  at  least  1, 

Gr' 

where  the  index  i  eorresponds  to  the  eombination  of  diserete  variable  values  defined  by 


Proof.  From  (3.8),  u,v  e  Mk{Xk)  =  0'^  x  '[J  G  0^  :  z  G  Let 

i=l  ^ 

+  AkD^Zu  and  =  X^  +  AkD'^Zy  where  Zu,  Zy  G  Z^  L  Since  but 

then  7^  v^.  If  follows  that  Zu  7^  Zy.  Then, 


XI  +  AuD^Zy  -XI-  AkD^Zu 


-  Xk  F)  fzy  Zy) 


=  Ak\\GiZi{Zy  -  Zu)\\ 


> 


> 


Xk 


The  last  inequality  holds  because  —  Zu)\\  >  1;  i.e.,  Zi{zy  —  Zy)  is  a  nonzero  integer 

vector  with  norm  at  least  1.  ■ 


Lemma  3.16  With  probability  1,  there  exists  a  positive  integer  6“  <  00  sueh  that  Ak  < 
Aor^**  for  any  fe  >  0,  fe  G  Z. 


Proof.  By  assumption  Al,  the  search  domain  is  bounded  so  the  discrete  variables  can  only 
take  on  a  finite  number  of  values.  Let  Zmax  denote  this  number  and  let  I  =  {1, ... ,  imax}- 
Also  under  assumption  Al,  for  each  i  G  /,  let  Aj  be  a  compact  set  in  containing 
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all  MGPS-RS  iterates  whose  discrete  variable  values  correspond  to  i  G  /.  Let  7  =  max 

iei 

diam(Aj)  and  /3  =  max  1 1 11  where  diam(-)  denotes  the  maximum  distance  between  any 
i^I 

two  points  in  the  set.  If  A^.  >  7/3,  then  by  Lemma  3.15  (with  v  =  X^),  any  mesh  point  u 
with  7^  X'j^  would  be  outside  of  This  can  be  seen  by  the  following: 


\u^-X2\\  > 


7  max  G-  ^ 


iGr^ll  IIG7 


>  7  =  max  diam(Aj)  (3.23) 
iei 


—  A^ll  >  max  diam(Ai) 
iei 


Thus,  Afc  >  7/3  implies  that  the  continuous  part  of  the  mesh  is  devoid  of  candidates  except 
for  the  incumbent.  Therefore,  Mk{Xk)  =  x  {A^}  and  Pk{Xk)  =  {A^}.  Furthermore, 
the  poll  set  for  any  discrete  neighbor  Y  of  A^  is  devoid  of  candidates  except  for  Y  by  the 
same  argument  as  (3.23)  using  Lemma  3.15  (with  R  =  A),  so  the  EXTENDED  POLL  step  is 
avoided. 

The  algorithm  can  consider  a  maximum  of  /max  different  candidates  defined  by  the 
combinations  of  ©'^  during  a  SEARCH  or  POLL  step.  The  mesh  size  parameter  grows  without 
bound  only  if  it  is  possible  to  cycle  indefinitely  between  these  /max  solutions.  But  Lemma 
3.14  guarantees  P{Ks  <  00)  =  1  where  Ks  is  the  number  of  consecutive  successful  iterations 
after  iteration  k.  Then  the  mesh  size  parameter  will  have  grown,  at  a  maximum,  by  a 
factor  of  (r^-max^Ars  jg  ^j^^g  i^ounded  above  by  7/3 (r™''"^)^®.  Let  6“  be  large  enough  so 
that  Aot^”  >  7/3(t™''"“)^®.  Then  P{Ks  <  00)  =  1  <  00)  =  1 

P(Aor^’"  <  00)  =  1  =>  P{b'^  <  00)  =  1.  ■ 


Theorem  3.17  The  mesh  size  parameters  satisfy  P  liminfAfc  =  0  J  =1. 

y  k — ^”1“00  j 


Proof.  By  way  of  contradiction,  suppose  there  exists  a  negative  integer  such  that 
Aqt^^  >  0  and  P(Afc  >  Aqt^^)  =  1  for  all  /c  >  0,  fc  G  Z.  By  definition  of  the  update  rules. 
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Afc  can  be  expressed  as  for  some  6fc  G  Z  (see  (3.13)).  Since  Lemma  3.16  ensures 

that  bk  is  bounded  above  a.s.  by  it  follows  that  bk  G  {b^,  6^  +  1, . . . ,  6“}  a.s.  Thus,  bk 
is  an  element  of  a  finite  set  of  integers  which  implies  that  A^  takes  on  a  finite  number  of 
values  for  all  /c  >  0. 

Now,  Xk+i  G  Mk  ensures  that  =  Xf,  +  AkD^Zk  for  some  Zk  G  Z^*'  and  some 

z  G  {1,  2, . . . ,  imax}-  Repeated  application  of  this  equation  leads  to  the  following  result  over 
a  fixed  i  at  iteration  A  >  1,  where  p  and  q  are  relatively  prime  integers  satisfying  r  =  ^: 

—  ^N~i  +  ^n-iD^zn-i 
=  (A^_2  +  A]^-2D^ZN-2)  +  A7V~lL^*2;Ar.-l 


—  Xq  +  AqD^zq  +  AiD^zi  +  •  •  •  +  A^_iZ1*2;7v— 1 
N-l 

=  XS+Y.AkD^Zk 

k=0 

N-1 

=  X^  +  D^Y.AoT^^Zk 

k=0 

N-1  /  N  bk 


AS  +  AoH* 


k=0 


Zk 


Xhk+b‘-¥) 


Zk 


k=0 


^  fc=0  ^ 


N-1 


fc=0 


N-l 


Since  and  are  both  integers,  then  is  a  |ll*| -dimensional 

A:=0 


vector  of  integers  (recall  Zk  G  Z^  ').  So,  the  continuous  part  of  each  iterate,  X^,  k  = 
0, . . . ,  A  having  the  same  discrete  variable  values  defined  by  i  lies  on  the  translated  integer 
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bt 

lattice  generated  by  Xq  and  the  columns  of  Furthermore,  the  discrete  part  of 

each  iterate,  lies  on  the  integer  lattice  C  By  assumption  Al,  all  iterates  belong 

to  a  compact  set,  so  there  must  be  only  a  finite  number  of  possible  iterates. 

Lemma  3.14  ensures  that  the  algorithm  cannot  cycle  indefinitely  between  these  points 

{i.e.  the  subsequence  of  consecutive  successful  iterations  is  finite  o.s.).  Thus,  as  fc  — *•  +oo, 

one  of  the  iterates  must  be  visited  infinitely  many  times  o.s.,  which  implies  an  infinite 

number  of  mesh  refinements.  But  this  contradicts  the  hypothesis  that  P(Afc  >  Aqt^*^)  =  1 

as  fe  — *•  +00.  Therefore,  P(Afc  >  Aqt^*^)  =  0,  which  implies  P  ( liminfA^.  =  0  )  =  1.  ■ 

\k^+oo  J 

The  results  of  this  section  illustrate  the  importance  of  the  restriction  that  H*  =  GiZi 
(Equation  (3.7)).  Under  assumption  Al,  this  ensures  that  the  mesh  has  a  finite  number  of 
points  in  0.  This,  combined  with  the  “finiteness”  of  incorrectly  selected  iterates,  ensures 
that  there  can  only  be  a  finite  number  of  consecutive  successful  iterations. 

3. 7. 3  Main  Results 

In  this  section,  the  existence  of  limit  points  for  MGPS-RS  iterates  is  proven.  In 
addition,  limit  points  are  shown  to  satisfy  the  first-order  necessary  conditions  for  optimality 
in  Definition  3.2.  The  results  have  been  modified  from  [13]  and  [1]  to  accommodate  the 
new  algorithmic  framework.  The  following  definition,  which  distinguishes  a  subsequence  of 
the  unsuccessful  iterates,  simplifies  the  analysis. 


Definition  3.18  (Refining  subsequence)  A  subsequence  of  unsuccessful  MGPS-RS  iterates 
{Xk}keK  (for  some  subset  of  indices  K)  is  said  to  be  a  refining  subsequence  if 

converges  almost  surely  to  zero,  ie.,  P  ( limAfc  =  0^  =1. 

\keK 


Since  A*,  shrinks  for  unsuccessful  iterations.  Theorem  3.17  guarantees  that  the  MGPS- 
RS  algorithm  has,  with  probability  1,  infinitely  many  such  iterations.  The  next  theorem. 
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similar  to  the  results  from  [13]  and  [1]  but  modified  here  for  the  probabilistic  setting, 
establishes  the  existence  of  certain  limit  points  associated  with  refining  subsequences. 

Theorem  3.19  There  exists  a  point  x  G  0  and  a  refining  subsequenee  {Xk}keK,  with 
assoeiated  index  set  K  C  {k  :  Xk+i  =  Xk}  sueh  that  {Xk}k&K  eonverges  almost  surely  to 
X.  Moreover,  if  M  is  eontinuous  at  x,  then  there  exists  y  G  J\f{x)  and  z  =  (z^^,  y'^)  G  0  sueh 
that  {Yk\k&K  eonverges  almost  surely  to  y  and  {Zk}kGK  eonverges  almost  surely  to  z  where 
eaeh  Zk  ^  Q  is  an  EXTENDED  POLL  endpoint  initiated  at  G  Af{Xk). 

Proof.  Theorem  3.17  guarantees  P  (  liminfAfc  =  0  I  =1;  thus  there  is  an  infinite  subset  of 

\k-^-\-oo  J 

indices  of  unsuccessful  iterates  K'  C  {k  :  Xk+i  =  Xk},  such  that  the  subsequence  {Ak}keK' 

converges  a.s.  to  zero,  i.e.,  PI  lim  =  0  I  =1.  Since  all  iterates  Xk  lie  in  a  compact 

\keK'  J 

set,  there  exists  an  infinite  subset  of  indices  K"  C  K'  such  that  the  subsequence  {Xk}k&K" 
converges  almost  surely.  Let  x  be  the  limit  point  of  such  a  subsequence. 

The  continuity  of  A/”  at  x  guarantees  that  y  G  J\f{x)  C  0  is  a  limit  point  of  a  subse¬ 
quence  Yk  G  J\f{Xk).  Let  z  G  0  be  a  limit  point  of  the  sequence  G  0  of  EXTENDED 
POLL  endpoints  initiated  at  Y^.  Choose  K  C  K"  to  be  such  that  both  {Yk}k&K  converges 
a.s.  to  y  and  {Zk}k&K  converges  a.s.,  letting  z  denote  the  limit  point.  ■ 

For  the  remainder  of  the  analysis,  it  is  assumed  that  x  and  K  satisfy  the  conditions 
of  Theorem  3.19.  The  following  lemma  establishes  the  first  main  result,  showing  that  limit 
points  satisfy  necessary  condition  2  of  Definition  3.2.  The  direct  proof  is  modified  for  the 
stochastic  case  from  [1],  where  it  was  presented  as  an  alternative  to  the  contradictory  proof 
in  [13]. 

Lemma  3.20  If  M  is  eontinuous  at  the  limit  point  x,  then  x  satisfies  /(x)  <  f{y)  a.s.  for 
all  y  G  Af{x). 

Proof.  From  Theorem  3.19,  the  sequences  {XkjkeK  and  {Yk}k£K  converge  a.s.  to  x  and  y, 
respectively.  Since  k  ^  K  C  {k  :  Xk+i  =  Xk},  each  {Xk}k£K  meets  one  of  the  conditions 
(3.20)-(3.22).  Assumption  A8  ensures  that  the  number  of  iterates  satisfying  condition 
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(3.22)  is  finite.  Furthermore,  since  the  set  iterates  meeting  condition  (3.21)  is  a  subset  of 
all  incorrectly  selected  iterates,  Lemma  3.13  ensures  the  number  of  iterates  satisfying  this 
condition  is  finite  almost  surely.  Therefore,  the  number  of  correctly  selected  iterates  in 
{Xk}k&K  meeting  condition  (3.20)  must  be  infinite.  Let  k'  denote  an  unsuccessful  iteration 
after  the  last  occurrence  of  both  conditions  (3.21)  and  (3.22)  and  let  K'  =  K  {k  >  k'} 
which  converges  a.s.  to  x.  Since  each  iterate  {Xk}k^K'  rneets  condition  (3.20),  /{X^)  < 
f{Yk)  for  all  k  G  K' .  By  the  continuity  of  M  and  assumption  A2,  f{x)  =  lim^g^/  /{X^)  < 
limfcgK.  f{Yk)  =  f{y).  ■ 

The  following  lemma  is  necessary  to  show  stationarity  of  the  iterates  Xk,  and  EX¬ 
TENDED  POLL  endpoints  Zk-  It  merges  two  lemmas  from  [13]  and  modifies  the  results 
therein  for  the  new  algorithmic  framework. 

Lemma  3.21  Let  w  be  the  limit  point  of  a  refining  subsequenee  {Wk}keK-  Then  (w^  — 
f{w)  >  0  a.s.  for  any  feasible 

Proof.  By  assumption  A2,  the  mean  value  theorem  applies,  i.e.,  for  points  xi  and  X2 
satisfying  xf  =  X2, 

f{x2)  =  f{xi)  +  {x2  -  xD^V^'fix)  where  x^  G  [a;^,  X2]  . 

For  xi  =  Wk,  X2  =  V  =  Wk  +  Ak{d,0)  G  Pk{Wk),  and  any  d  E  D\  C  that  is  feasible 
infinitely  often,  substitution  yields 

fiV)  =  f{Wk  +  Ak{d,  0))  =  f{Wk)  +  Akd^V^fiWk  +  XtAkid,  0))  (3.24) 

for  Xf.  G  [0, 1]  that  depends  on  the  iteration  k  and  positive  basis  vector  d.  Choose  k  E  K 
large  enough  so  that  the  indifference  zone  condition  is  satisfied  and  incorrect  selections  have 
terminated  almost  surely.  Then  by  condition  (3.20),  f{V)  —  f{Wk)  >  Sr{k)  where  Sr{k) 
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depends  on  k.  Furthermore, 


f{Wk)  <  -5r{k) 

=  min  {f{Wk)  +  AkcfV^fiWk  +  A^Afc(d,  0))}  -  Sr{k) 

=  f{Wk)-Sr{k)  +  Akinm\(fW^fiWk  +  XiAk{d,0))}  , 

which  implies  that 

min  I cfV^^fiWk  +  XiAkid,  0))|  >  4  • 

deDi  t  J 

Taking  the  limit  as  fc  — *•  oo  (in  K)  yields  min^gzji  |(i^V‘^/(ra)}  >  0  a.s.  (since  limfc^oo  4(fc)  = 
0  by  assumption  A6).  Therefore,  (jP"V^f{w)  >  0  a.s.  for  any  d  ^  that  is  feasible  infi¬ 
nitely  often. 

By  assumption  A4,  any  feasible  direction  (w'^—w^)  is  a  nonnegative  linear  combination 
of  feasible  directions  in  that  span  the  tangent  cone  of  0^  at  w.  Then  for  Pj  >  0, 
j  =  1,  2, . . . ,  Ud,  {w^  -  w^)  =  J27=i 

rid 

{w^  -  wYV^fiwP  =  ^pjdjv^fiwp  >  0  a.s..  m 
i=i 

It  is  now  possible  to  state  the  second  main  result.  Lemma  3.22  shows  that  the  limit 
point  X  satisfies  condition  1  of  Definition  3.2. 

Lemma  3.22  The  limit  point  x  satisfies  {x‘^—xp'^'V^f{x)  >  0  a.s.  for  any  feasible  {x‘^,x'^). 

Proof.  The  result  follows  directly  from  Lemma  3.21  by  substituting  for  as  the 
refining  subsequence,  and  from  results  on  the  sequence  of  Theorem  3.19.  ■ 

The  remaining  result  may  now  be  completed.  Lemma  3.23  shows  that  limit  points 
X  and  discrete  neighbors  y  that  satisfy  f{y)  =  f{x)  meet  condition  3  of  Definition  3.2. 
Theorem  3.24  collects  all  the  main  results  into  a  single  theorem. 
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Lemma  3.23  The  limit  point  x  and  any  point  y  in  the  set  of  neighbors  Af{x)  satisfying 
f{y)  =  f{x),  are  sueh  that  (y^  —  y^)"’"V^fiy)  >  0  a.s.  for  any  feasible 


Proof.  Choose  k'  ^  K  large  enough  so  that  the  indifference  zone  condition  is  satisfied 
and  incorrect  selections  have  terminated  almost  surely  and  lei  K'  =  K  {k  >  k'} .  Then 
by  condition  (3.17),  f{Y^)  <  f{Y^^^)  for  all  k  G  K',  which  implies  f{Z}f)  <  f{Y}f)  for 
all  k  G  K' .  Furthermore,  since  K'  is  a  subset  of  unsuccessful  iterates,  condition  (3.20)  is 
satisfied,  which  implies  f{Xk)  <  f{Zk)  for  each  k  G  K' .  By  continuity  of  /  and  taking  the 
limit  as  /c  — *•  oo  (in  K'),  it  follows  that  f{x)  <  f{z)  <  f{y).  Therefore,  f{z)  =  f{y). 

By  the  differentiability  of  /,  it  follows  that 


(yC  _  ^  (yC  _  0))  ^  li 

r— ^0 


/(y  +  i(/-y'',o))  -/(y) 


=  lin.  =  lin,  >  lin. 

keK'  Ak  keK'  Ak  keK'  Ak 


=  {z^-Yfv^myo, 


where  f'{y]  (y^^—y^,  0))  denotes  the  directional  derivative  of  /  at  y  in  the  direction  {y^—y^^,  0), 
and  the  last  inequality  follows  by  substituting  for  114  as  the  refining  subsequence  in 
Lemma  3.21.  ■ 


Theorem  3.24  The  limit  point  x  satisfies  first-order  neeessary  eonditions  for  optimality 

a.s.. 


Proof.  The  result,  based  on  conditions  1-3  of  Definition  3.2,  follows  directly  from  Lemmas 
3.20,  3.22,  and  3.23.  ■ 

3.8  Illustrative  Example 

Prior  to  a  comprehensive  computational  evaluation  of  specific  algorithm  implementa¬ 
tions  in  Chapter  5,  a  basic  version  of  the  algorithm  is  illustrated  on  a  small  unconstrained 
problem  with  two  continuous  variables  and  one  discrete  (binary)  variable.  Consider  the 
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response  function 


F{x)  =  f{x)  +  N{0,a\f{x))  (3.25) 

where  N (0,  a‘^{f{x))  is  a  normally  distributed,  mean-zero  noise  term  added  to  an  underlying 
true  objective  function.  The  variance  cr^  of  the  noise  depends  on  the  true  function  /(x), 
which  is  defined  over  {xi,X2)^  G  and  X3  G  {0, 1}  as 

f{x)  =  /i(xi,a:2)(l  -  X3)  f2{xi,X2)x3  (3.26) 


where  the  functions  /i  and  /2  are  overlapping  quadratic  functions  (see  Figure  3.5)  as  follows: 

/i(a;i,X2)  =  (xi  -  9/4)^ (a;2  -  9/4)^ -M, 
f2{xi,X2)  =  l/2(xi  -  3/2)2 +  1/2(x2- 3/2)2 +  7/4. 

The  optimum  is  located  at  x*  =  (x*,X2,X3)  =  (|,  |,0)  with  f{x*)  =  1. 

To  compare  two  different  random  noise  scenarios,  the  standard  deviation  of  the  error 
term  a{f{x))  is  either  proportional  or  inversely  proportional  to  /: 


<xi{f{x)) 

<72(/(x)) 


f{x),  or 


1 

fix)  ■ 


These  test  cases  are  referred  to  as  noise  cases  1  and  2,  respectively.  At  optimality,  ai  and 
(72  are  equal  but  diverge  for  trial  points  away  from  optimality. 

The  two-stage  indifference-zone  procedure  of  Rinott  for  unequal  variances  [114]  was 
implemented  as  the  R&S  method.  This  procedure  uses  two  stages  of  sampling  to  estimate 
the  true  mean  of  the  response  function  for  each  candidate.  In  the  first  stage,  the  sample 
variance  S'^  for  each  candidate  q  is  computed  from  a  fixed  number  of  response  samples  for 
each  candidate.  This  information  is  used  to  determine  the  number  of  second-stage  samples 
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f(xi,X2,  X3)  =  fi{xi,X2){l  -  X3)  +  f2{xi,X2)X3 


Figure  3.5.  Example  Test  Function. 

required  to  guarantee  a  probability  of  correct  selection.  Given  sq  first-stage  samples  with 
sample  variance  for  each  candidate  q,  Rinott’s  procedure  prescribes  Sq  —  sq  additional 
samples,  where 

(3.27) 

g  =  g{nc,a,  So)  is  Rinott’s  constant,  and  [m]  indicates  the  smallest  integer  greater  than 
or  equal  to  m  (ceiling  function).  Tabulated  values  for  g  have  been  published  for  com¬ 
monly  used  parameter  combinations  but  was  computed  numerically  in  this  investigation  by 
adapting  the  code  listed  in  [27]  to  accommodate  changing  parameter  settings.  The  objec¬ 
tive  function  value  is  then  estimated  by  averaging  the  response  samples  over  both  stages 
for  each  candidate.  To  satisfy  the  requirements  on  the  R&S  parameters,  both  5r  and  ar 
were  decremented  geometrically,  i.e.  Sr  =  <^0(7*5)’’  and  Or  =  ao{Pa)^-  The  following  settings 
were  used  in  the  numerical  experiment:  sq  =  5,  (5o  =  1-0,  ao  =  0.4,  and  Ps  =  Pa  —  -05. 
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For  this  example,  the  algorithm  was  implemented  with  an  empty  SEARCH  step.  The 
direction  set  consisted  of  the  coordinate  axes,  i.e.  Dl  =  [/,  —I]  for  all  k  and  both  settings 
of  i.  The  discrete  neighbor  set  was  defined  as  J\f{x)  =  {(xi,X2,X3),  {xi,X2, 1  —  a;3)}.  The 
step  size  parameters  were  set  to  r  =  |,  =  —2  for  all  k,  and  =  1  for  all  k  so  that 

Afc+i  =  (|)^Afc  for  refinement  and  A^+i  =  |Afc  for  coarsening.  These  parameters  were 
selected  so  that  the  search  steps  lengthened  after  successful  iterations  but  not  so  much  as 
to  cause  high  variance  for  candidates  in  the  poll  set  when  near  the  optimal  solution.  The 
initial  step  size  was  set  to  Aq  =  0.5.  The  extended  poll  trigger  |  was  used  for  all  k  to 
ensure  that,  at  the  minimum  =  (xi,X2)  =  (|,  |)  of  /2,  the  surface  of  fi  is  polled  since 

In  the  numerical  experiment,  the  algorithm  was  replicated  twenty  times  for  each  noise 
case.  All  forty  replications  were  initiated  from  starting  solution  Xq  =  (0,5, 1),  /(Aq)  =  9. 
For  each  noise  case,  the  following  metrics  were  used  to  gauge  the  performance: 

•  average  number  of  candidate  solutions  visited, 

•  average  distance  of  terminal  solution  from  x*:  1 1  x*  —  x*  1 1 , 

•  average  difference  of  terminal  solution  in  true  objective  function  value  from 
f{x*):  \f  -  f*\,  and 

•  average  number  of  iterations  completed. 

The  distance  from  optimum  was  measured  as  the  sum  of  the  Euclidean  distance  in  the 
continuous  domain  and  the  value  of  the  discrete  variable  ||x*  —  a;*||  =  ||(x^)*  —  (x‘^)*||+(x'^)*. 
Each  of  the  metrics  was  recorded  at  ten  predetermined  stages  of  algorithm  progression, 
measured  in  terms  of  the  number  of  responses  sampled,  and  averaged  over  the  twenty 
replications.  The  results  for  noise  cases  1  and  2  are  presented  in  Tables  3.1  and  3.2, 
respectively. 
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Table  3.1.  MGPS-RS  Average  Performance  for  Noise  Case  1  over  20  Replica¬ 
tions. 


Response 

Samples 

Candidates 

Visited 

1  X*  —  x*! 

1 

* 

Iterations 

0 

- 

4.55 

8.00 

- 

2,500 

.9 

4.50 

7.76 

.2 

5,000 

6.9 

4.13 

6.26 

1.2 

7,500 

13.5 

3.67 

4.53 

2.5 

10,000 

22.8 

2.85 

2.82 

4.4 

20,000 

73.5 

.657 

.343 

13.4 

30,000 

99.8 

.587 

.294 

17.9 

40,000 

116.2 

.449 

.198 

20.9 

50,000 

130.1 

.464 

.185 

23.5 

75,000 

150.6 

.350 

.137 

27.6 

100,000 

166.4 

.279 

.122 

30.8 

Table  3.2.  MGPS-RS  Average  Performance  for  Noise  Case  2  over  20  Replica¬ 
tions. 


Response 

Samples 

Candidates 

Visited 

X*  —  x*| 

1 

* 

Iterations 

0 

- 

4.55 

8.00 

- 

2,500 

113.7 

.495 

.272 

19.9 

5,000 

133.0 

.411 

.204 

23.1 

7,500 

145.5 

.382 

.184 

25.3 

10,000 

153.3 

.376 

.172 

26.8 

20,000 

173.0 

.331 

.143 

30.6 

30,000 

185.0 

.302 

.123 

32.9 

40,000 

193.0 

.288 

.113 

34.5 

50,000 

200.5 

.264 

.095 

36.0 

75,000 

213.0 

.234 

.073 

38.5 

100,000 

222.0 

.219 

.062 

40.3 

The  results  clearly  illustrate  the  effects  of  the  response  variance  on  algorithm  perfor¬ 
mance.  Since  <72  =  at  the  starting  solution,  the  algorithm  in  noise  case  2  is  able  to 
reach  better  solutions  much  more  rapidly  than  in  noise  case  1.  In  fact,  it  takes  the  algorithm 
approximately  40,000  response  samples  in  case  1  to  reach  equivalent  progress  achieved  after 
2,500  samples  in  case  2.  After  100,000  response  samples,  the  algorithm  in  case  2  outper¬ 
forms  case  1  by  approximately  30%  in  the  number  of  candidate  solutions  considered  and 
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number  of  iterations  completed  while  finding  a  solution  roughly  twice  as  good  in  terms  of 
quality  of  the  true  objective  function  value.  On  the  other  hand,  after  achieving  significant 
progress  after  2,500  response  samples  in  case  2,  progress  slows  considerably  after  reaching 
a  region  of  the  search  space  where  the  standard  deviation  of  the  noise  approaches  unity 
and  the  surface  begins  to  flatten.  For  noise  case  1,  the  error  control  measures  built  into 
the  algorithm  enable  consistent  progress  despite  the  challenging  situation  introduced  by 
high  response  variation.  It  should  be  noted  that  response  variation  has  a  profound  effect 
on  computational  requirements  relative  to  the  deterministic  case.  By  comparison,  applying 
the  algorithm  to  the  noise- free  version  of  this  problem  {i.e.  cr  =  0)  produced  a  solution  that 
was  within  0.0134  of  x*  with  an  objective  function  value  within  0.000181  of  f{x*)  after  300 
function  samples. 

For  both  noise  cases,  the  solution  found  after  100,000  response  samples  was  on  the 
surface  of  fi .  In  noise  case  1 ,  sixteen  of  the  twenty  replications  had  permanently  reached  the 
surface  of  fi  by  20,000  response  samples,  which  accounts  for  the  significant  improvement 
between  10,000  and  20,000  samples  in  Table  3.1.  In  noise  case  2,  all  iterates  after  the 
eleventh  iteration,  on  average,  remained  on  /i,  well  before  2,500  responses  were  evaluated. 
In  the  most  extreme  case  under  noise  case  2,  the  maximum  number  of  iterations  required 
before  all  iterates  remained  on  fi  was  twenty-three.  In  this  case,  the  algorithm  found  a  point 
in  the  continuous  design  space  for  which  the  values  of  fi  and  /2  were  very  close  in  magnitude. 
In  particular,  at  the  point  =  (1.645,1.692),  the  value  |/i(x^)  —  f2{x‘^)\  =  0.111.  The 
algorithm  alternated  between  discrete  neighbors  0)  and  (x*^,  1)  from  iteration  13  until 
iteration  23,  during  which  time  approximately  2,000  response  samples  were  obtained  and  the 
number  of  R&S  procedures  performed  by  the  algorithm  increased  from  18  to  37.  As  a  result, 
the  indifference  zone  parameter  had  been  reduced  from  iJig  =  (.95)^®  =  0.397  >  |/i(x‘^)  — 
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f2{x‘^)\  to  S37  =  (.95)^^  =  0.150  >  |/i(a;‘^)  — /2(x^)|.  Therefore,  the  R&S  procedure  could  not 
prescribe  enough  samples  to  detect  the  best  solution  among  the  two  discrete  neighbors  until 
6r  had  been  reduced  to  a  value  that  approached  the  absolute  difference  between  the  two 
neighbors.  This  isolated  case  illustrates  the  potential  computational  requirements  necessary 
to  detect  small  differences  between  candidate  solutions  in  the  presence  of  random  variation. 

To  illustrate  the  asymptotic  behavior  of  the  algorithm,  the  algorithm  was  run  again 
starting  from  the  optimal  point  Xq  =  x*  =  (|,  |,0)  with  the  standard  deviation  of  the 
noise  term  a  =  2  throughout  the  design  space.  For  this  run,  the  same  parameter  settings 
as  in  the  original  experiment  were  used  except  for  the  initial  significance  level,  which  was 
set  to  ao  =  0.8  to  encourage  erroneous  iterate  selections  for  the  purpose  of  illustration.  The 
run  was  terminated  after  two  million  response  samples.  The  iteration  history  is  depicted 
in  Figure  3.6.  The  figure  also  plots  the  decay  of  the  indifference  zone  parameter  as  the 
downward  sloping  curve  as  well  as  the  cumulative  response  samples  as  the  upward  sloping 
dashed  line  (with  scale  on  the  right). 

The  plot  shows  that,  although  starting  from  the  optimal  point,  many  unimproving 
steps  are  taken,  indicated  by  an  increase  in  true  objective  value  of  the  iterates.  However,  the 
magnitude  of  the  difference  between  successive  iterates,  in  most  cases,  is  within  the  tolerance 
defined  by  the  indifference  zone  line.  Three  exceptions  occur  at  iterations  1,  8,  and  13,  when 
the  true  function  value  for  the  iterates  jumps  above  the  indifference  zone  boundary^.  In 
these  three  cases,  the  significance  levels  were  ai  =  .8(.95)  =  0.76,  as  =  .8(.95)®  =  0.53,  and 
«13  =  .8(.95)^^  =  0.41,  respectively.  (Note  that  no  extended  poll  steps  were  performed  so 
r  =  k  throughout  the  iteration  sequence.)  Therefore,  the  three  iterates  selected  at  iterations 
1,  8,  and  13  represent  incorrect  selections  for  which  the  probability  of  incorrect  selection 

^Iteration  4,  as  well  as  iterations  36  through  40,  are  not  examples  of  these  exceptions,  even  though  the 
objective  function  values  exceed  f{x*)  +  5k,  because  the  differences  between  f(xk)  at  these  iterations  and 
that  of  the  previous  iterate  do  not  exceed  5k- 
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Iterations 

Figure  3.6.  Asymptotic  behavior  of  MGPS-RS,  shown  after  2  million  response 
samples. 

was  0.76,  0.53  and  0.41,  respectively.  However,  as  the  iteration  sequence  continues,  the 
search  settles  down  near  the  optimal  point  and  the  magnitude  of  unimproving  solutions 
decreases  commensurate  with  the  indifference  zone  parameter.  Also,  no  additional  iterates 
are  incorrectly  selected  since  the  significance  level  (not  shown  in  the  plot)  is  decaying  at 
the  same  rate  as  the  indifference  zone  parameter.  However,  the  costs  associated  with  error 
control  during  the  latter  stages  of  the  search  are  evident  with  the  rapid  increase  in  response 
samples  required  per  iteration. 

Figure  3.6  illustrates  the  computational  implications  of  achieving  better  solutions  as 
the  search  progresses.  Clearly,  for  MGPS-RS  algorithms  to  have  practical  value,  it  is 
important  to  address  concerns  regarding  the  sampling  effort  required.  In  this  chapter, 
the  mathematical  framework  of  MGPS-RS  was  presented  and  its  convergence  properties 
rigorously  established.  In  the  following  chapter,  various  implementation  alternatives  are 
described  that  seek  efficient  use  of  the  sampling  budget. 


76 


Chapter  4  -  Algorithm  Implementations 

In  this  chapter,  the  details  of  the  various  MGPS-RS  algorithm  implementations  are  pre¬ 
sented.  Particular  attention  is  given  to  implementations  that  have  the  potential  to  provide 
computational  enhancements  to  the  basic  algorithm.  Two  essential  ideas  are  presented  that 
specifically  address  methods  to  improve  the  computational  performance  of  the  algorithms. 
The  first,  described  in  Section  4.1,  is  the  use  of  modern  ranking  and  selection  techniques 
to  offer  more  efficient  sampling  strategies  relative  to  the  basic  procedure  of  Rinott.  The 
second  idea  is  to  augment  the  search  by  using  surrogate  functions  during  the  SEARCH  step 
as  a  means  to  model  the  relationship  between  input  designs  and  response  outputs  based 
on  previously  obtained  samples.  The  goal  of  this  approach,  introduced  in  Section  4.2,  is  to 
develop  an  inexpensive  method  to  nominate  high  quality  trial  points  and  thus  accelerate 
algorithm  convergence.  Another  important  concept  relevant  to  computational  performance 
is  discussed  in  Section  4.3,  which  proposes  a  strategy  for  establishing  appropriate  algo¬ 
rithm  termination  criteria.  The  strategy  seeks  to  avoid  additional  sampling  when  further 
sampling  would  lead  to  marginal  returns  on  objective  function  value  improvement.  Section 
4.4  unifies  the  various  implementation  considerations  into  an  overarching  algorithm  design 
that  describes  implementation  in  further  detail  for  the  algorithm  substeps  and  summarizes 
the  algorithm  parameters.  Section  4.5  summarizes  the  key  points  of  the  chapter. 

4-1  Specific  Ranking  and  Selection  (R&S)  Procedures 

An  important  concern  for  implementation  is  the  selection  of  specific  R&S  procedures. 
Since  unknown  and  unequal  variances  are  allowed,  a  procedure  having  at  least  two  stages 
is  required,  allowing  the  sample  variance  to  be  computed  in  an  initial  stage.  Three  such 
procedures  were  selected  for  implementation  and  computational  evaluation:  Rinott’s  two- 
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stage  procedure  [114],  a  screen-and-select  (SAS)  procedure  of  Nelson  et  al.  [99],  and  the 
Sequential  Selection  with  Memory  (SSM)  procedure  of  Pichitlamken  and  Nelson  [109]. 


Rinott’s  two-stage  procedure,  described  in  Section  3.8,  is  a  well-known,  simple  proce¬ 
dure  that  satisfies  the  probability  of  correct  selection  guarantee  (3.15).  It  uses  the  sample 
variance  from  a  fixed  number  of  first-stage  samples  for  each  candidate  to  determine  the 
number  of  second-stage  samples  required  to  guarantee  the  probability  of  correct  selection. 
A  detailed  listing  of  Rinott’s  procedure,  adapted  from  [27,  p.61],  is  provided  in  Figure  4.1. 
In  the  procedure,  the  number  of  second-stage  samples  is  dependent  on  Rinott’s  constant 
g  =  g{nc,  a,  which  is  the  solution  to  the  equation 


Tlc-l 

fv{y)dy  =  1-  a  (4.1) 

where  <h(-)  is  the  standard  normal  cumulative  distribution  function  and  fvi')  is  the  proba¬ 
bility  distribution  function  of  the  y^-distribution  with  v  degrees  of  freedom.  This  constant 

can  be  obtained  from  a  table  of  values  or  computed  numerically.  To  account  for  the  chang- 

(R) 

ing  parameter  a  in  the  computational  evaluation  of  Chapter  5,  a  MATLAB^  m-file  was 
written  to  compute  g  that  was  based  on  the  FORTRAN  program  RINOTT  listed  in  Appen¬ 
dix  C  of  [27]. 

Rinott’s  procedure  can  be  computationally  inefficient  because  it  is  constructed  based 
on  the  least  favorable  configuration  assumption  that  the  best  candidate  has  a  true  mean 
exactly  5  better  than  all  remaining  candidates,  which  are  all  tied  for  second  best  [140].  As 
a  result,  the  procedure  can  overprescribe  the  number  of  required  second  stage  samples  in 
order  to  guarantee  the  PICS'}.  Furthermore,  the  procedure  has  no  mechanism  to  consider 
the  sample  mean  of  the  responses  after  the  first  stage,  and  therefore  cannot  eliminate 
clearly  inferior  candidates  prior  to  conducting  additional  sampling.  These  characteristics 
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For  a  candidate  set  C  indexed  by  g  €  {1,  •  •  ■  ,nc},  fix  the  common  number  of  replications  sq  >  2 
to  be  taken  in  Stage  1,  significance  level  a,  and  indifference  zone  parameter  5.  Find  the  constant 
g  =  g{nc,  a,  v)  that  solves  (4.1)  where  v  =  sq  —  I. 

Stage  1 :  For  each  candidate  q,  collect  sq  response  samples  Fqg,  s  =  1, . . . ,  sq. 

Stage  2:  Calculate  the  sample  means  and  variances  based  on  the  stage  1  samples,  FIj(so)  = 
^  SsLi  and  S^  =  —  Fq{sQ)).  Collect  Sq  —  sq  additional  response  samples  for 

candidate  q=  1,2, ...  ,nc  where 

Sq  =  max  {so,  \{gSq/6f]  }  . 

Calculate  Fq{sq)  =  Sq^  Fqs,  q  =  1, 2, . . . ,  nc-  based  on  the  combined  results  of  the  Stage  1  and 
Stage  2  samples.  Select  the  candidate  associated  with  the  smallest  sample  mean  over  both  stages, 
min{.Fq(sq)},  as  having  the  (5-near-best  mean. 

Figure  4.1.  Rinott  Selection  Procedure  (adapted  from  [27]) 

are  especially  problematic  within  an  iterative  search  framework  since  the  R&S  procedure  is 
executed  repeatedly  and  the  number  of  unnecessary  samples  accumulates  at  each  iteration, 
limiting  the  progress  of  the  algorithm  relative  to  a  fixed  budget  of  response  samples. 

The  SAS  procedure  alleviates  some  of  the  computational  concerns  of  Rinott ’s  proce¬ 
dure  by  combining  Rinott ’s  procedure  with  a  screening  step  that  can  eliminate  some  solu¬ 
tions  after  the  first  stage.  For  an  overall  significance  level  a,  significance  levels  ai  and  02 
are  chosen  for  screening  and  selection,  respectively,  such  that  a  =  ai  +  a2.  After  collecting 
So  samples  of  each  candidate  in  the  first  stage,  those  candidates  with  a  sample  mean  that  is 
significantly  inferior  to  the  best  of  the  rest  are  eliminated  from  further  sampling.  The  set 
of  surviving  candidates  is  guaranteed  to  contain  the  best  with  probability  at  least  1  —  ai 
as  long  as  the  indifference  zone  condition  is  met.  Then,  Sg  —  sq  second  stage  samples  are 
required  only  for  the  survivors  according  to  (3.27)  except  at  significance  level  02  instead  of 
a.  Nelson  et  al.  [99]  prove  that  the  combined  procedure  satisfies  (3.15).  A  detailed  listing 
of  the  combined  screen-and-select  procedure  is  provided  in  Figure  4.2. 
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For  a  candidate  set  C  indexed  by  q  G  {1,  •  •  •  ,nc},  fix  the  common  number  of  first-stage  samples 
So  >  2,  overall  significance  level  a  =  a\  +  a2,  screening  significance  level  a\,  selection  significance 
level  a2,  and  indifference  zone  parameter  5.  Set  t  =  t  i  and  g  =  g{nc,  Oi2,  y),  where 

(1— cni)  ,1/  ’ 

is  the  (3  quantile  of  the  t  distribution  with  y  =  sq  —  1  degrees  of  freedom  and  g  is  Rinott’s  constant 
that  solves  (4.1). 

Stage  1.  (Screening)  For  each  candidate  q  =  1,2,  ...nc,  collect  sq  response  samples  Fqs,  s  = 
l,...,so.  Calculate  the  sample  means  and  variances  based  on  the  initial  sq  samples,  Fq{so)  = 
Esli  Fqs/so  and  =  y-^  Y.T=i(Pqs  -  Fq{so))^.  Let 

fS^q  52 

Wqp  =  t\—-\ — L  I  for  all  g  7^  p. 

yso  So  J 

Set  Q  =  {q  '■  1  <  q  <  nc  and  Fq{so)  <  Fp{so)  +  (Wqp  —  5)  +  ,Vp  g}  where  =  max{0,  j/}.  If 
IQI  =  1,  then  stop  and  report  the  only  survivor  as  the  best;  otherwise,  for  each  q  G  Q  compute  the 
second  stage  sample  size 

Sq  =  max  {so,  \{gSq/6f]  }  . 

Stage  2.  (Selection)  Collect  Sq  —  so  additional  response  samples  for  the  survivors  of  the  screening 
step  q  &  Q  and  compute  overall  sample  means  Fq{sq)  =  Sq^  EEi  Fqs,  q  &  Q-  Select  the  candidate 
associated  with  the  smallest  sample  mean  over  both  stages,  min{.Fq(sq)},  as  having  the  5-near-best 
mean. 

Figure  4.2.  Combined  Screening  and  Selection  (SAS)  Procedure  (adapted  from 
[99]) 

The  SSM  procedure  extends  the  notion  of  intermediate  elimination  of  inferior  solutions. 
It  is  a  fully  sequential  procedure  specifically  designed  for  iterative  search  routines.  A  fully 
sequential  procedure  is  one  that  takes  one  sample  at  a  time  from  every  candidate  still  in 
play  and  eliminates  clearly  inferior  ones  as  soon  as  their  inferiority  is  apparent.  The  SSM 
procedure  is  an  extension  of  the  procedure  presented  in  [69] ;  the  difference  being  that  SSM 
allows  the  re-use  of  previously  sampled  responses  when  design  points  are  revisited  where 
the  procedure  in  [69]  does  not. 

In  SSM,  an  initial  stage  of  sampling  is  conducted  to  estimate  the  variances  between 
each  pair  of  candidates,  indexed  by  G  {1, . . . ,  nc},  according  to, 

=  I  -  ^^’("o)])'.  (4.2) 
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This  is  followed  by  a  sequence  of  screening  steps  that  eliminates  candidates  whose  cumula¬ 
tive  sums  exceed  the  best  of  the  rest  plus  a  tolerance  level  that  depends  on  the  variances 
and  parameters  (5  and  a.  Between  each  successive  screening  step,  one  additional  sample  is 
taken  from  each  survivor  and  the  tolerance  level  decreases.  The  procedure  terminates  when 
only  one  survivor  remains  or  after  exceeding  a  maximum  number  of  samples  determined 
after  the  initial  stage.  In  the  latter  case,  the  survivor  with  the  minimum  sample  mean  is 
selected  as  the  best.  Pichitlamken  [108]  proves  that  SSM  satisfies  (3.15).  An  advantage  of 
this  method  is  that  the  re-use  of  previously  sampled  responses  can  lead  to  further  compu¬ 
tational  savings.  A  detailed  listing  of  SSM  is  shown  in  Figure  4.3. 

The  latter  two  R&S  procedures  were  implemented  because  they  offer  more  efficient 
sampling  methods  relative  to  Rinott’s  procedure  when  the  least  favorable  configuration 
assumption  does  not  hold;  however,  this  advantage  does  not  come  without  cost.  In  order 
to  reduce  sampling,  they  must  repeatedly  switch  among  the  various  candidates.  If  each 
candidate  represents  a  single  instance  of  a  simulation  model,  then  there  may  be  a  sizable 
switching  cost  that  can  require,  for  example,  storing  the  state  information  of  the  current 
model,  saving  relevant  output  data,  replacing  the  executable  code  in  active  memory  with 
the  code  of  the  next  model,  and  restoring  the  state  information  of  the  next  model  [59]. 
An  important  element  of  evaluating  the  R&S  procedures  in  the  MGPS-RS  framework  is  to 
consider  the  number  of  cumulative  switches  required,  where  the  term  switch  denotes  each 
time  the  algorithm  must  return  to  a  previously  sampled  candidate  for  further  sampling 
during  the  same  iteration. 

Rinott’s  procedure  incurs  no  switches  because  the  second  stage  of  sampling  for  each 
candidate  can  begin  immediately  after  the  first  stage  since  the  number  of  second  stage 
samples  does  not  depend  on  comparisons  of  output  data  between  candidates.  The  SAS 
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Step  1.  Initialization.  For  a  candidate  set  C  indexed  by  €  {1, . . . ,  nc},  fix  the  common  number 
of  minimum  samples  sq  ^  2,  significance  level  a,  and  indifference  zone  parameter  5.  Let  V  denote 
the  set  of  solutions  visited  previously.  Let  Q  C  denote  the  set  of  solutions  seen  for  the  first  time. 
For  each  Yq  e  collect  sq  response  samples  Fqs,  s  =  1, . . . ,  sq-  For  each  Yq  €VUC  with  Sq  stored 
responses,  collect  additional  response  samples  Fqs,  s  =  Sq,  Sq  +  1, . . .  ,so  and  set  Sq  =  sq.  Update 
U  Yq  and  V  =  V\Yq.  Compute  variance  Sgp  using  (4.2)  where  n  =  sq  —  1- 

Step  2.  Procedure  parameters:  Let 


nS': 


qp 


mp 


26 


nc  -1 
2a 


2/j/ 


(4.3) 


Let  Rqp  — 


2aa 


,  Rg  =  meoc{Rgp},  and  R  ~  max{i?g}.  If  sq  >  i?,  then  stop  and  select  the  solution 

q^p  q 

with  the  lowest  Fq{so)  =  Sg  ^  Cjs  as  the  best.  Otherwise,  let  Q  =  {1, .. .  ,nc}  be  the  set  of 
surviving  solutions,  set  t  =  sq  and  proceed  to  Step  3.  From  here  on  V  represents  the  set  of  solutions 
for  which  more  than  t  observations  have  been  obtained,  while  U°  is  the  set  of  solutions  with  exactly 
t  observations. 


Step  3.  Sereening:  Set  =  Q-  Let 

Q=  iq-.qG  Q°^'^  and  T,  <  min  {Tp  +  Oqp}  +  ^|  (4.4) 

where 

^1=1  Fps 

^  \  tFp{sp)  for  YpGV  ■ 

In  essence,  for  Yq  with  Sg  >  t,  tFq{sq)  is  substituted  for  X]s=i  Fqs- 

Step  4.  Stopping  Rule:  If  |Q|  =  1,  then  stop  and  report  the  only  survivor  as  the  best;  otherwise, 
for  each  q  €  Q  and  Yq  e  U°,  collect  one  additional  response  sample  and  set  t  =  t  +  1.  It  t  =  R  +  1, 
terminate  the  procedure  and  select  the  solution  in  Q  with  the  smallest  sample  mean  as  the  best; 
otherwise,  for  each  q  G  Q  and  Yq  G  V  with  Sq  =  t,  set  U°  =  V‘^UYq  and  V  =  V\Yq  and  go  to  Step  3. 


Figure  4.3.  Sequential  Selection  with  Memory  (adapted  from  [109]) 


procedure  of  Figure  4.2  requires  a  single  switch  for  each  candidate  that  survives  the  screen¬ 
ing  step  if  a  second  stage  is  necessary.  The  SSM  procedure  requires  a  switch  each  time  an 
additional  sample  is  collected  in  Step  4  of  Figure  4.3,  which  can  potentially  lead  to  a  large 
number  of  switches  if  the  number  of  candidates  is  large  and  if  the  candidates  are  nearly 
homogeneous  in  terms  of  mean  response.  The  computational  evaluation  of  Chapter  5  ad¬ 
dresses  the  tradeoff  between  sampling  costs  and  switching  costs. 
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J^.2  Use  of  Surrogate  Models 

To  further  address  computational  enhancements  for  MGPS-RS  algorithms,  an  optional 
SEARCH  step  was  implemented  to  exploit  the  flexibility  of  the  pattern  search  framework 
with  the  goal  of  accelerating  convergence  to  a  near-optimal  region  of  the  design  space.  In 
particular,  previously  sampled  points  evaluated  prior  to  and  during  the  search  are  used 
to  construct  a  surrogate  function  that  approximates  the  true  objective  function.  This 
function  is  then  searched  during  the  SEARCH  step  to  nominate  high  quality  trial  points.  If 
the  surrogate  is  reasonably  accurate  and  can  be  evaluated  inexpensively  relative  to  the  cost 
of  generating  response  samples,  then  the  search  may  progress  to  good  solutions  with  fewer 
cumulative  samples  than  if  no  SEARCH  step  is  used.  Even  if  the  initial  surrogate  is  poor, 
convergence  is  still  guaranteed  and  a  savings  may  still  be  achieved  (see  [31]). 

Paramount  to  the  construction  of  surrogates  is  the  selection  of  a  family  of  plausible 
functions  for  use  in  approximating  the  true  objective  function.  To  avoid  assuming  a  specific 
parametric  representation  of  the  underlying  structure  of  /,  an  estimation  technique  is  used 
from  the  nonparametric  regression  literature;  the  Nadaraya- Watson  estimator  [95, 151]  is 
used  to  approximate  the  objective  function  at  a  point  x  according  to  (2.8).  In  this  disser¬ 
tation  research,  the  commonly  used  multivariate  Gaussian  kernel  function,  originally  pro¬ 
posed  in  univariate  form  by  Parzen  [105],  is  used.  This  results  in  the  regression  equation 

^F.  exp 

fix)  =  -  (4.5) 

E-p(-S) 

i=i 

where  =  [x  —  Xj)‘^  represents  the  squared  Euclidean  distance  from  x  to  Xj  and  /i  >  0 
is  a  smoothing  parameter  that  determines  the  width  of  the  kernel  centered  at  each  site 
Xj.  For  this  reason,  h  is  often  called  the  bandwidth.  The  estimator  /  is  referred  to  as  the 


83 


surrogate  function.  The  regression  function  (4.5)  has  also  been  described  in  the  context  of 
generalized  regression  neural  networks  [135]. 

As  discussed  in  Section  2.1.5,  the  estimator  /  can  be  thought  of  as  the  weighted  average 
of  all  response  means,  Fj,  where  the  weight  received  by  Fj  depends  on  the  distance  between 
the  corresponding  Xj  and  the  estimation  point.  The  bandwidth  h  essentially  determines 
the  degree  of  nonlinearity  in  the  surrogate  function.  As  h  increases,  the  curvature  in  / 
decreases  such  that,  when  h  is  very  large,  /  is  a  constant  that  assumes  the  mean  value 
of  all  Fj]  i.e.,  ^j-  Smaller  values  of  h  allow  more  curvature  in  /  but  can  cause 

outliers  to  have  too  great  an  effect  on  the  estimate.  If  h  is  zero,  /  assumes  the  value  of  Fj 
for  the  corresponding  Xj  that  is  nearest  the  estimation  point.  The  effect  of  the  bandwidth 
value  is  illustrated  in  Figure  4.4,  where  the  surrogate  function  (4.5)  is  fit  to  the  following 
eight  input /response  pairs:  (1,  180),  (2,  189),  (3,  170),  (4,  188),  (5,  207),  (6,  212),  (7,  196), 
and  (8,  257).  The  figure  shows  that  as  the  bandwidth  increases,  the  surrogate  function 
becomes  less  descriptive  in  terms  of  the  curvature  of  the  surface,  eventually  flattening  to  a 
horizontal  line  equalling  the  mean  of  the  responses. 


Figure  4.4.  Smoothing  eflfect  of  various  bandwidth  settings  for  fitting  a  surface 
to  eight  design  sites  in  one  dimension. 
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An  advantage  of  the  kernel  regression  approach  is  its  simplicity.  To  evaluate  a  surrogate 

function  at  a  design  point,  all  that  is  required  is  storage  of  the  pairs  {xj,Fj)  and  an 

“appropriate  setting”  for  the  lone  bandwidth  parameter.  However,  care  must  be  taken  in 

the  practical  consideration  of  selecting  this  setting.  In  this  dissertation  research,  the  well- 

known  leave-one-out  eross-validation  method  [55,  p.  152]  is  used.  In  this  method,  the 

bandwidth  is  first  set  to  a  fixed  value,  and  the  estimator  /  is  computed  according  to  (4.5) 

at  design  site  xj,  except  that  Xj  is  excluded  from  {left  out  of)  the  summand 

N 

^  T>exp(-g) 

=  -  • 

^  exp(-^) 

The  squared  error  {fj{xj,h)  —  FjY  is  then  recorded  and  summed  over  all  sites  Xj,  j  = 
1, . .  .N.  The  resulting  sum  of  squared  errors  (SSE), 

N 

SSE{h)  =  Y,U3{x„h)-F,)\ 

j=i 

is  then  used  as  a  criterion  for  evaluating  h.  This  procedure  is  repeated  over  a  range  of 
bandwidth  values  and  the  setting  that  delivers  the  smallest  SSE  is  selected. 

To  build  the  original  surrogate  function  prior  to  initiating  the  search,  it  is  necessary  to 
select  design  sites  xi, ...  ,xn  via  some  appropriate  experimental  design  technique.  For  this 
purpose,  latin  hypereube  sampling  (LHS)  [88]  is  used.  In  LHS,  a  total  of  p  equally-spaced 
values  for  each  of  the  continuous  variables  are  used  as  components  of  the  design  site 
vectors.  These  values  are  randomly  matched  to  form  p  design  sites,  li  N  =  p,  then  each  of 
the  p  values  is  represented  exactly  once  in  the  set  of  design  sites  xi, ... ,  xn,  and  the  design 
is  said  to  be  of  strength  one.  Designs  of  strength  two  {N  =  2p)  are  used  in  this  dissertation 
research  so  that  the  design  space  is  sampled  more  densely;  the  random  matching  operation 
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is  performed  twice.  Figure  4.5  illustrates  latin  hypercube  samples  of  strengths  one  and  two 
for  a  two-dimensional  design  space. 


Strength  1  Strength  2 

Figure  4.5.  Examples  of  Latin  Hypercube  Samples  of  Strengths  1  and  2  for  p  =  5. 


Once  the  surrogate  function  is  built,  it  can  be  utilized  in  the  pattern  search  framework 
as  an  inexpensive  means  to  nominate  trial  points  from  the  mesh  Mk  in  the  SEARCH  step 
of  the  algorithm.  After  trial  points  are  evaluated,  they  are  then  added  as  design  sites  that 
enhance  the  accuracy  of  the  surrogate  function.  A  straightforward  approach  is  simply  to 
minimize  /  on  the  mesh  directly  using  any  deterministic  search  routine.  However,  such  a 
greedy  approach  may  inhibit  improvements  in  accuracy  of  the  surrogate  function  because 
the  trial  points  will  cluster  in  a  particular  region  of  the  design  space. 

Alternatively,  the  technique  of  Torczon  and  Trosset  [145]  is  used  to  seek  improvements 
in  /  while  simultaneously  seeking  space-filling  points  that  could  improve  the  accuracy  of  the 
surrogate  function.  Torczon  and  Trosset  [145]  propose  a  biobjective  function  of  the  form, 

m{x)  =  f{x)  —  \d{x)  (4.6) 
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where  d{x)  =  min  ||x  —  a;j||2  is  the  distance  from  x  to  the  nearest  previously  sampled  design 
site  and  A  >  0  determines  the  relative  weight  placed  on  the  space-filling  objective.  The 
function  m{x)  is  referred  to  as  the  merit  function. 

The  surrogate  function  /  is  a  smooth  approximation  to  the  unknown  objective  function 
with  respect  to  the  continuous  variables.  If  one  or  more  discrete  variables  change,  then  the 
true  objective  function  may  have  an  entirely  different  structure.  Therefore,  when  using 
surrogates  in  a  mixed-variable  pattern  search  framework,  it  is  necessary  to  maintain  a 
surrogate  function  for  each  combination  of  discrete  variables  i  =  1, . . . ,  Zmax-  Consequently, 
the  number  of  initial  design  sites,  surrogate  function,  merit  function,  bandwidth,  and  space¬ 
filling  parameter  are  indexed  as  Ni,  fi,  rrii,  hi,  and  Aj,  respectively.  If  Ni  design  sites  are 
selected  for  combination  i,  each  requiring  s  samples  of  the  response  function,  then  a  budget 
of  X]i=T  ^  s  response  samples  is  required  during  algorithm  initialization. 

The  space-filling  parameters  Aj  need  not  remain  constant  throughout  the  search.  In 
fact,  as  Torczon  and  Trosset  [145]  suggest,  these  parameters  should  tend  to  zero  so  that, 
after  the  surrogate  is  sufficiently  accurate,  the  algorithm  searches  the  surrogate  directly.  In 
the  implementation,  initial  settings  are  used  that  are  multiples  of  the  maximum  difference 
between  mean  responses  of  the  initial  design  sites  for  each  combination  of  discrete  variable 
values.  As  the  algorithm  progresses,  the  parameters  Xi  decay  after  each  SEARCH  step. 

The  MGPS-RS  algorithm  using  surrogates  is  now  illustrated  on  a  very  simple  example 
that  has  two  continuous  variables  and  one  discrete  (binary)  variable  where  each  continuous 
variable  is  bounded  on  the  range  [—10,10].  Consider  the  same  additive  noise  response 
function  (3.25)  and  true  function  (3.26)  from  Section  3.8.  In  this  example,  the  functions 
/i  and  /2  are  linear  and  quadratic  functions,  respectively,  that  overlap  in  the  continuous 
domain.  Therefore,  when  the  discrete  variable  x^  is  0  (1),  the  function  takes  a  linear 
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(quadratic)  form.  The  functions  are  defined  as 


fi{xi,X2)  =  21  -  xi  -  X2,  and 
f2{xi,X2)  =  xl  +  xl- 

The  optimum  is  located  at  x*  =  (0,0,1)  with  f{x*)  =  0  and  the  starting  point  was  set 
to  xo  =  (—5,— 5,0)  with  /(xq)  =  31  so  that  the  initial  value  of  the  discrete  variable  is 
suboptimal.  The  standard  deviation  of  the  noise  term  was  set  to  cr  =  2  throughout  the 
design  space. 

A  progression  of  the  algorithm  is  illustrated  Figure  4.6.  For  comparison  purposes,  the 
true  function  is  shown  in  Figure  4.6a.  In  Figure  4.6b,  the  initial  surrogate  surfaces  are 
shown  for  each  of  the  two  binary  variable  settings.  A  strength  one  LHS  design  with  p  =  10 
was  used  to  determine  the  initial  design  sites  and  five  samples  were  taken  at  each  design  site. 
Therefore,  a  budget  of  10  x  5  x  2  =  100  response  samples  was  necessary  to  build  the  original 
surrogates.  Figure  4.6c  shows  the  surrogate  surfaces  after  nine  iterations  of  the  algorithm. 
By  this  point,  500  response  samples  had  been  generated  and  eight  new  design  sites  had 
been  added  to  the  surrogates.  It  can  be  seen  how  the  space-filling  parameter  has  forced 
the  algorithm  to  evaluate  points  relatively  far  from  the  minimal  point  on  either  surrogate 
surface.  Figure  4.6d  shows  the  surfaces  after  17  iterations  and  2000  response  samples.  The 
search  has  now  begun  to  cluster  near  the  optimal  point,  as  desired.  Additionally,  the  form 
of  the  surface  approximating  the  quadratic  function  appears  to  more  accurately  predict  the 
true  response. 

Due  to  its  simplicity,  no  notable  improvements  in  the  speed  of  convergence  are  attained 
for  this  example  by  using  surrogates.  However,  the  importance  of  improvements  in  surrogate 
accuracy  is  clearly  illustrated.  Note,  for  example,  that  in  Figure  4.6b,  the  minimum  on  the 


a.)  True  objective  function 


0  5 

X1 

b.)  Original  surrogate  surface  after 
100  response  samples 
^  -  sites  sampled  from  linear  function 
%  -  sites  sampled  from  quadratic  function 


c.)  Surrogate  surface  after  500  response  samples 
(8  sites  added) 

{)-  sites  sampled  from  linear  function  (3) 

Q-  sites  sampled  from  quadratic  function  (5) 


d.)  Surrogate  surface  after  2000  response 
(15  sites  added) 

0-  sites  sampled  from  linear  function  (3) 

Q-  sites  sampled  from  quadratic  function  (12) 


Figure  4.6.  Demonstration  of  the  surrogate  building  process  during  MGPS-RS 
algorithm  execution. 
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surrogate  surfaces  is  actually  located  on  the  surface  corresponding  to  the  linear  function 
/i-  By  forcing  the  search  to  evaluate  space-filling  points,  the  accuracy  of  the  surface 
corresponding  to  /2  was  eventually  improved  so  that  the  surrogates  correctly  predicted  a 
minimum  on  the  surface  of  f2-  However,  this  behavior  is  not  always  guaranteed  and  presents 
some  complications  for  problems  with  mixed  variables.  Fortunately,  the  algorithm  provides 
the  fail-safe  extended  poll  step  that  can  ensure  a  search  of  alternative  surfaces  provided 
the  extended  poll  trigger  is  set  sufficiently  large.  For  this  reason,  it  may  be  beneficial  to 
set  this  parameter  large  enough  in  early  iterations  to  ensure  that  enough  design  points  are 
sampled  to  enable  accuracy  improvements  for  each  surrogate  function  /j. 

4 . 3  Termination  Criteria 

An  important  consideration  for  algorithm  implementation  involves  the  decision  of  when 
to  stop  the  algorithm.  It  is  not  uncommon  for  the  termination  decision  to  not  be  based  on 
any  particular  strategy,  allowing  the  algorithm  to  run  until  expending  a  fixed  budget  of  iter¬ 
ations  or  response  samples.  For  MGPS-RS  algorithms,  this  can  be  disadvantageous  because 
the  search  may  reach  a  point  where  additional  sampling  leads  to  diminishing  returns  as  the 
parameters  ar  and  5r  get  very  small.  Figure  3.6  demonstrated  that  marginal  improvement 
can  lead  to  an  explosion  in  sampling  requirements  after  a  certain  point  in  the  search.  This 
is  further  illustrated  in  Figure  4.7,  where  the  MGPS-RS  (without  surrogates)  sampling  re¬ 
quirements  per  Rinott  R&S  procedure  are  plotted  for  problems  of  dimension  two  and  ten. 
The  sampling  requirements  are  based  on  a  direction  set  D  =  [/,  — /]  consisting  of  positive 
and  negative  coordinate  axes.  In  the  figure,  the  R&S  parameters  are  reduced  geometrically 
{ur  =  ao(0-95)^  and  5r  =  (5o(0-95)^)  from  initial  settings  of  ag  =  0.8  and  (ig  =  2.0,  the  vari¬ 
ance  is  assumed  constant  at  =  1,  and  the  number  of  first-stage  samples  is  set  to  sg  =  5. 
The  number  of  response  samples  for  each  R&S  procedure  is  computed  as  the  product  of 
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R&S  Procedure  Count,  r 

Figure  4.7.  Growth  of  response  samples  required  per  Rinott  R&S  procedure  for 
a  fixed  response  variance  of  5^  =  1. 


the  samples  required  per  candidate  using  Rinott’s  second-stage  formula  the 

number  of  candidates  nc  =  2n  +  1  (two  for  each  dimension  plus  the  incumbent). 

Figure  4.7  demonstrates  that,  even  for  small  problems,  sampling  requirements  can 
become  prohibitive  (exceeding  30,000  for  a  single  R&S  procedure  on  a  10-dimensional 
problem  after  a  modest  number  of  iterations)  if  the  parameters  are  reduced  too  aggressively 
or  if  the  initial  settings  are  too  small.  It  would  be  advantageous  if  the  algorithm  could 
detect,  based  on  some  set  of  rules,  a  situation  in  which  further  progress  would  require  a 
sampling  effort  that  exceeds  some  threshold  that  reflects  a  budget  restriction.  For  this 
purpose,  Rinott’s  formula  may  be  adapted  for  use  as  a  heuristic  tool  to  predict  when 
sampling  requirements  become  excessive  according  to  a  user-defined  threshold.  Using  the 
formula,  a  per-iteration  budgeting  threshold  B  of  response  samples  may  be  expressed  as 

B  ^  g^nc  (I) 
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where  g  =  g{nc,ar,  sq)  increases  with  nc  and  1  —  ar-  Observing  that  the  size  of  the 
candidate  set  nc  =  nc{n)  is  dependent  on  problem  size,  the  budgeting  threshold  can  be 
normalized  with  respect  to  problem  size  by  setting  it  equal  to  a  multiple  of  g'^nc,  i-e., 
B  =  K{g‘^nc)  for  some  G  M.  Therefore,  a  measure  for  estimating  a  point  of  minimal 
returns  on  sampling  may  be  expressed  in  terms  of  the  ratio  between  the  response  standard 
deviation  and  the  indifference  zone  parameter  as 

^  .  (4.7) 

The  setting  for  K  may  be  selected  by  the  user  based  available  sampling  budget;  larger  values 
of  K  allow  larger  budgeting  thresholds.  As  the  algorithm  progresses,  response  variance 
can  be  estimated  by  computing  the  sample  variance  5^  of  one  of  the  candidates  {e.g.,  the 
incumbent)  from  an  initial  sampling  stage  and  comparing  its  root  to  the  current  value  of 
the  indifference  zone  parameter.  If  the  ratio  exceeds  the  scalar  '/K,  then  one  condition  of 
termination  may  be  considered  satisfied. 

Expression  (4.7)  has  intuitive  appeal  because  the  difference  between  the  two  best 
candidates  implied  by  Sr  can  be  expressed  in  terms  of  the  standard  deviation  of  the  noise 
as  K^^S.  However,  this  approach  implicitly  assumes  that  the  value  ar  has  reached  an 
appropriate  level.  That  is,  it  is  desirable  for  ar  to  have  reached  a  level  to  ensure  sufficient 
error  control  {e.g.,  ar  =  0.05).  To  ensure  that  a  desirable  level  of  error  control  for  iterate 
selection  is  reached  by  the  end  of  the  search,  a  secondary  criterion  is  proposed  that  requires 
a  sufficiently  low  value  of  ar, 

ar  <  ar  ,  (4.8) 

where  aT  is  a  threshold  setting  that  defines  the  minimum  desired  probability  of  correct 
selection  1  —  ax  from  among  the  candidates  at  termination.  This  measure  provides  a 
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useful  means  to  bound  the  probability  of  correct  selection  at  an  appropriate  level  by  the 
end  of  the  search,  but  does  not  prevent  from  decreasing  to  such  a  small  value  that 
Rinott’s  constant  g  becomes  very  large,  resulting  in  increased  sampling  independent  of  the 
ratio  Y-  This  can  be  seen  in  Figure  4.8,  which  shows  the  number  of  second-stage  samples 
required  using  Rinott’s  procedure  for  two-  and  ten-dimensional  problems  with  direction 
set  D  =  [/,—/],  a  fixed  response  standard  deviation  to  indifference  zone  ratio  of  j  =  1, 
and  first-stage  sampling  size  of  sq  =  5.  The  figure  shows  that  sample  size  increases  only 
moderately  with  1  —  a  until  a  gets  very  close  to  zero,  when  it  increases  dramatically. 

Very  small  values  in  have  the  same  effect  (increasing  samples)  using  the  SAS  R&S 
procedure  and  a  similar  effect  when  using  the  SSM  procedure.  In  the  latter  case,  decreasing 
ar  causes  increasing  values  of  Uqp  in  (4.3)  which  then  increases  the  tolerance  in  (4.4)  used  to 
screen  candidates.  The  increased  tolerance  makes  it  more  difficult  to  screen  out  the  inferior 
candidates  so  that  they  are  retained  for  additional  samples  when  they  would  otherwise  be 
eliminated  from  contention  as  the  best  design.  An  approach  to  algorithm  design  could  allow 
the  rate  of  decay  of  the  ar  parameter  to  adapt  during  the  search  so  that  this  parameter 
does  not  decay  too  aggressively  and  cause  excessive  sampling.  However,  adaptive  parameter 
updates  are  an  item  for  further  study;  in  this  research,  the  decay  rate  is  determined  a 
priori  and  its  effect  on  sampling  requirements  and  algorithm  termination  analyzed  in  the 
computational  evaluation  of  Chapter  5. 

A  final  criterion  for  termination  invokes  the  traditional  measure  used  in  deterministic 
optimization  via  pattern  search.  In  the  original  pattern  search,  Hooke  and  Jeeves  [60] 
suggested  terminating  the  search  when  the  step  size  reached  a  sufficiently  small  value, 

Afc  <  At  ,  (4.9) 
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for  some  small  threshold  setting  A^.  This  has  long  been  used  as  a  stopping  criteria  and 
has  been  justified  analytically  by  Dolan  et  al.  [40]  by  showing  that  provides  a  bound 
on  first-order  stationarity  measured  in  terms  of  the  norm  of  the  gradient  ||V/(xfc)||. 

In  the  present  case,  with  noisy  response  functions,  a  combination  of  (4.7),  (4.8),  and 
(4.9)  is  proposed  as  criteria  for  terminating  MGPS-RS  algorithms.  The  condition  (4.9) 
requires  that  enough  unsuccessful  iterations  have  occurred  so  that  the  poll  set  essentially 
converges  to  a  set  of  points  in  near  proximity  to  each  other.  However,  condition  (4.8) 
provides  a  safeguard  to  prevent  a  sequence  of  erroneous  selections  from  causing  the  step  size 
to  decrease  to  a  small  value,  prematurely  terminating  the  algorithm  if  only  the  traditional 
criterion  (4.9)  were  used.  Finally,  the  intent  of  (4.7)  is  to  provide  a  heuristic  means  to 
signal  the  onset  of  inflated  sampling  requirements  caused  by  a  high  ratio  of  response  noise 
to  indifference  zone  parameter. 

4.4  Algorithm  Design 

With  the  various  implementation  details  defined,  the  overall  design  of  the  algorithm 

(r1 

may  now  be  described.  The  algorithm  was  coded  in  the  MATLAB^  programming  lan- 
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guage;  a  flow  chart  in  Figure  4.9  shows  the  general  sequence  of  steps  taken  by  the  algorithm 
code.  The  figure  shows  the  interface  between  the  algorithm  and  a  stochastic  simulation 
model  via  the  R&S  procedure.  A  more  detailed  mathematical  description  of  the  algorithm 
is  shown  in  Figure  4.10,  which  is  an  update  to  Figure  3.4  to  include  the  use  of  surrogates 
in  the  SEARCH  step. 

4- 4-1  Building  the  Surrogate  During  Initialization 

The  first  step  in  algorithm  execution  is  the  initialization  step.  When  using  surrogates, 
initialization  requires  building  initial  surrogate  functions  for  each  combination  of  design 
variables.  Besides  the  number  of  design  sites  and  samples  per  design  site  to  select,  there 
are  other  important  surrogate-building  considerations  that  must  be  taken  into  account. 
First,  the  boundaries  of  the  region  used  for  the  initial  latin  hypercube  sampling  designs 
must  be  defined.  If  the  variables  are  bounded,  then  the  lower  and  upper  limits  of  this 
region  are  simply  set  to  the  lower  and  upper  bound  vectors  I  and  u,  respectively.  If  some 
or  all  variables  are  unbounded,  then  the  range  for  each  variable  must  be  decided  on  by  the 
user  and  this  becomes  a  parameter  in  the  initialization  step. 

If  there  are  linear  constraints,  then  a  second  consideration  involves  what  to  do  with 
design  sites  that  are  infeasible  with  respect  to  the  linear  constraints;  that  is,  the  feasible 
sampling  region  may  be  irregular  (nonrect angular)  due  to  the  constraints.  One  approach 
is  to  simply  discard  infeasible  design  sites  prior  to  sampling  [32] .  This  is  appropriate  if  the 
stochastic  model  is  undefined  for  infeasible  designs;  however,  it  results  in  a  loss  of  some 
design  sites  that  can  negatively  impact  accuracy  of  the  surrogate.  In  this  research,  it  is 
assumed  that  infeasible  designs  can  be  sampled  and  are  retained  in  the  set  of  design  sites 
in  order  to  improve  surrogate  accuracy,  particularly  in  regions  near  the  linear  constraint 
boundaries. 
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Figure  4.9.  Algorithm  flow  chart  for  MGPS-RS  using  surrogates. 
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MGPS-RS  Algorithm  Using  Surrogates 

Initialization:  For  each  combination  of  discrete  variables  i  =  1, .  • . ,  imax,  do  the  following 

•  Select  Ni  design  sites  and  set  the  number  of  samples  s  per  design  site. 

•  Collect  the  initial  NiXs  samples  and  compute  mean  responses  Fj  for  each  design 
site  j  =  1, . . .  Ni. 

•  Calibrate  hi  using  the  leave-one-out  cross-validation  method,  construct  /*,  and 
set  space- filling  parameter  A*  >  0. 

Set  the  iteration  counter  k  to  0.  Set  the  R&S  counter  r  to  0.  Choose  a  feasible  starting  point, 

Xo  e  0.  Set  Ao  >  0,  ^  >  0,  oo  e  (0, 1),  and  So  >  0. 

Until  termination  criteria  are  satisfied,  do  the  following: 

1.  Search  step:  Find  a  candidate  Y  on  the  mesh  Mk{Xk)  defined  in  (3.8)  that  minimizes 

where  m*  is  defined  in  (4.6).  Use  Procedure  RS({y,  X^},  Sr)  to  return 
the  estimated  best  solution  Uji]  G  {Y,Xk}.  Add  U  as  a  design  site  and  recalibrate  the 
appropriate  hi,  update  and  r  =  r  -f-  1.  If  ^  X^,  the  step 

is  successful,  update  X^+i  =  d^ij,  A^+i  >  A*,  according  to  (3.12),  and  k  =  k  +  1  and 
repeat  Step  1.  Otherwise,  proceed  to  Step  2. 

2.  Poll  step:  Set  extended  poll  trigger  ff.  >  Use  Procedure  RS(Pfc(Xfc)  U  Ar(Xfc), 
ar,  Sr)  where  Pk{Xk)  is  defined  in  (3.9)  to  return  the  estimated  best  solution  1^;^]  G 
Pfc(Afc)  U  A/'(Afc).  Update  a^+i  <  ctr,  (5r+i  <  and  r  =  r  +  1.  If  l)i]  ^  Xk,  the 
step  is  successful,  add  Y^ij  as  a  design  site  and  recalibrate  the  appropriate  hi,  update 
Xk+i  =  U[i],  Afc+i  >  Afc  according  to  (3.12),  and  k  =  k  +  1  and  return  to  Step  1. 
Otherwise,  proceed  to  Step  3. 

3.  Extended  poll  step:  For  each  discrete  neighbor  Y  G  ^[{Xk)  that  satisfies  the 
extended  poll  trigger  condition  F{Y)  <  F{Xk)  +  fk^  set  j  =  1  and  YjJ  =Y  and  do  the 
following. 

a.  Use  Procedure  RS(Pa;(U^),  Ur,  Sr)  to  return  the  estimated  best  solution  Up]  G 
Pk{Yi).  Update  a^+i  <  otr,  Sr+i  <  Sr,  and  r  =  r-t- 1.  If  l)i]  7^  U/,  set  =  F)i] 
and  j  =  j  +  1  and  repeat  Step  3a.  Otherwise,  set  Zk  =  Y^  and  proceed  to  Step 
3b. 

b.  Use  Procedure  RS(Xfc  U  Zk)  to  return  the  estimated  best  solution  1);^]  =  Xk  or 

1^1]  =  Zk-  Update  (5^+1  <  Sr,  and  r  =  r  -|-  1.  If  =  Zk,  the  step 

is  successful,  add  as  a  design  site  and  recalibrate  the  appropriate  hi,  update 
Xk+i  =  ^)i])  ^fc+i  >  according  to  (3.12),  and  k  =  k  +  1  and  return  to  Step  1. 
Otherwise,  repeat  Step  3  for  another  discrete  neighbor  that  satisfies  the  extended 
poll  trigger  condition.  If  no  such  discrete  neighbors  remain,  set  Xk+i  =  Xk, 
Afc+i  <  Afc  according  to  (3.11),  and  k  =  k  +  1  and  return  to  Step  1. 

Figure  4.10.  MGPS-RS  Algorithm  using  Surrogates  for  Stochastic  Optimization 
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A  third  consideration  involves  defining  the  region  of  the  design  space  within  which 
the  surrogate  function  can  be  trusted.  This  is  relevant  for  unbounded  problems  in  which  a 
user-defined  range  must  be  established  for  the  initial  surrogate.  Defining  such  a  region  is 
important  because  kernel  regression  methods  are  interpolatory  since  the  regression  surface 
approaches  a  constant  hyperplane,  with  a  value  equal  to  the  mean  response  of  the  nearest 
design  site,  outside  the  sampling  region.  In  this  research,  the  radius  rads  of  the  “searchable” 
region  in  the  SEARCH  step  is  approximated  as  one-half  of  the  maximum  Euclidean  distance 
between  any  pair  of  the  initial  design  sites.  During  the  SEARCH  step,  the  search  is  restricted 
to  a  ball  centered  at  the  starting  point  with  radius  rads- 

A  final  consideration  involves  scaling  the  design  sites  so  that  they  have  approximately 
the  same  ranges  when  building  and  evaluating  the  surrogate  function.  Scaling  is  important 
because  variable  ranges  may  be  quite  different  due  to  differing  bounds  but  the  bandwidth 
parameter  prescribes  the  same  width  of  the  underlying  Gaussian  in  each  dimension.  In  the 
algorithm,  scaling  is  accomplished  by  normalizing  each  variable  of  the  design  site  vector; 


that  is,  subtracting  the  mean  and  dividing  by  the  standard  deviation.  For  design  site 
Xj  =  the  normalized  elements  are  represented  as 

i  —i 

x^A  =  ^ - ,  ^=l,...,n‘^ 


where  x^  =  ]v  oi  =  —  x^)j  .  Since  the  surrogate  is  built 

with  respect  to  normalized  design  sites,  then  the  trial  points  in  the  SEARCH  step  are  also 
normalized  before  being  evaluated  with  respect  to  the  surrogate  function. 


The  final  steps  conducted  before  initiating  the  search  are,  for  each  i  =  I, . . . ,  imax,  to 


calibrate  hi  using  leave-one-out  cross-validation,  and  assigning  a  value  to  the  initial  space¬ 
filling  parameter  A*.  As  mentioned  in  Section  4.2,  initial  settings  of  Aj,  i  =  1, . . . ,  imax,  are 
used  that  are  a  multiple  of  the  maximum  difference  between  mean  responses  of  the  initial 
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design  sites  for  each  combination  of  discrete  variable  values 


—  Si  max  ^Fj  F£  |  ,  i  —  1,  .  .  .  ,  imaxi 

where  the  scalars  6i  become  parameters  that  define  Aj,  i  =  1, . . . ,  imax,  during  initialization. 
As  the  algorithm  progresses,  the  parameters  Xi,  i  =  l,...,imax5  ^'I'e  halved  after  each 
SEARCH  step. 

4- 4-^  Algorithm  Search  Steps  and  Termination 

After  initialization  is  complete,  the  search  begins  by  seeking  a  mesh  point  that  mini¬ 
mizes  the  merit  function(s)  (defined  in  (4.6))  in  Step  1.  In  this  research,  pattern  search  is 
used  for  this  purpose  although,  in  general,  any  search  procedure  could  be  used,  including  a 
random  draw  from  viable  mesh  points.  Beginning  from  the  incumbent,  pattern  search  ap¬ 
plied  to  the  merit  function  is  carried  out  in  the  continuous  domain  through  a  series  of  POLL 
steps  using  the  current  value  of  the  step  size  parameter  and  the  direction  set  D*  to  define 
the  neighboring  mesh  points  for  discrete  variable  combination  i.  Once  a  local  optimizer  has 
been  found  for  the  current  i,  then  the  discrete  neighbors  at  the  optimal  point  are  evaluated 
with  respect  to  their  merit  function  to  check  for  further  improvement.  If  no  improvement 
is  found  among  the  discrete  neighbors,  then  extended  polling  is  conducted  in  the  continu¬ 
ous  neighborhood  of  all  discrete  neighbors  to  take  advantage  of  the  relative  inexpense  of 
evaluating  the  merit  functions  compared  to  response  function  sampling.  The  search  of  the 
merit  function(s)  terminates  with  the  selection  of  a  single  trial  point  to  be  paired  with  the 
incumbent  design  in  a  candidate  set  that  is  passed  to  the  R&S  procedure.  The  SEARCH 
step  culminates  after  the  estimated  best  design  is  returned  from  the  R&S  procedure.  Re¬ 
gardless  of  whether  the  trial  point  obtained  in  the  SEARCH  step  successfully  replaces  the 
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incumbent  as  the  new  iterate,  its  mean  response  is  recorded  so  the  point  may  be  added  as 
a  new  design  site  for  the  surrogate  function  in  order  to  improve  surrogate  accuracy. 

The  POLL  step  and  EXTENDED  POLL  step  function  as  described  in  Section  3.6;  how¬ 
ever,  if  surrogates  are  used,  an  additional  step  is  added  after  their  termination.  If  either 
step  results  in  a  success,  then  the  new  iterate  is  added  as  a  design  site.  The  reasons  why 
other  trial  points  evaluated  during  the  POLL  step  and  EXTENDED  POLL  step  are  not  added 
are  twofold.  First,  these  steps  are  more  localized  than  the  SEARCH  step,  so  that  the  points 
evaluated  will  tend  to  cluster  near  the  incumbent,  giving  artificially  high  weight  in  that  re¬ 
gion  for  surrogate  evaluation.  By  contrast,  the  SEARCH  step  intentionally  seeks  points  that 
help  fill  the  experimental  design  space  used  to  build  the  surrogate,  so  even  trial  points  of 
unsuccessful  SEARCH  steps  are  worthy  of  adding  as  design  sites  for  the  purpose  of  surro¬ 
gate  accuracy  enhancement.  Secondly,  if  too  many  design  sites  are  added,  evaluating  the 
surrogate  function  can  become  overly  expensive,  which  defeats  its  purpose.  This  can  be 
seen  by  reviewing  Equation  (4.5)  and  noting  that  the  expression  requires  two  summations 
of  N  terms  where  N  is  the  number  of  design  sites.  In  addition,  the  summation  elements 
require  the  computation  of  the  Euclidean  distance  between  a  trial  point  and  a  design  site. 
A  fine  mesh  can  lead  to  many  trial  points  evaluated  during  the  SEARCH  step,  which  can 
result  in  many  evaluations  of  (4.5).  Since  many  points  may  be  evaluated  during  the  POLL 
step  and  EXTENDED  POLL  step,  it  would  be  counterproductive  to  add  all  of  them  as  design 
sites;  therefore,  only  new  iterates  after  successful  steps  are  added.  Any  time  a  new  design 
site  is  added,  the  current  bandwidth  may  not  provide  the  minimum  sum  of  squared  error 
over  the  set  of  augmented  design  sites.  For  this  reason,  the  algorithm  calls  the  calibration 
routine  to  recalibrate  the  appropriate  bandwidth  parameter  and  improve  the  accuracy  of 
the  surrogate. 
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It  should  also  be  mentioned  that  at  the  beginning  of  the  POLL  step,  the  direction 
set  Dk  is  updated  to  ensure  conforming  directions  are  included  when  near  the  constraint 
boundaries.  The  algorithm  used  for  computing  conforming  directions  was  adapted  from 
Abramson  [1,  p.  49],  which  is  equivalent  to  the  original  algorithm  of  Lewis  and  Torczon 
[81].  For  completeness,  the  algorithm  listing  is  shown  in  Figure  4.11.  This  algorithm  is 
valid  in  the  absence  of  degenerate  constraints. 

The  use  of  surrogates  to  augment  the  search  is  a  valuable  enhancement  to  the  al¬ 
gorithm.  However,  the  portion  that  ensures  its  rigor  in  a  stochastic  setting  is  the  R&S 
procedure.  In  the  algorithm,  the  R&S  procedure  manages  the  interface  with  the  stochastic 
model  by  passing  the  design  variable  vector  of  each  candidate  to  the  model  and  prescribing 
the  number  of  response  samples  necessary  to  meet  the  correct  selection  probability  guar¬ 
antee.  Depending  on  the  specific  procedure  used  -  Rinott’s,  SAS,  or  SSM  -  the  procedure 
may  also  need  to  manage  the  overhead  necessary  to  repeatedly  switch  between  candidate 
designs  to  gather  the  required  samples. 

Set  efc  >  e  >  0.  Assume  the  current  iterate  satisfies  I  <  AXk  <  u. 

While  Efe  >  e,  do  the  following: 

1.  Let  Ii{Xk,  Cfc)  =  {i  :  AXk  —  ^  <  efc} 

2.  Let  Iu(^Xk^  Ck')  —  .  u  AXk  — 

3.  Let  V  denote  the  matrix  whose  columns  are  formed  by  all  members  of  the  set 
{— Uj  :  i  G  Ii{Xk,  Cfc)}  U  {tti  :  i  G  Iu{Xk,  Cfc)},  where  aj  denotes  the  ith  row  of  A. 

4.  If  V  does  not  have  full  column  rank,  then  reduce  just  until  \Ii{Xk,  efc)|  -|-  \Iu{Xk,  ek)\ 
is  decreased,  and  return  to  Step  1. 

Set  B  =  and  A  =  J  -  ViV'^Vy^V'^ . 

Set  Dk  =  [N,-N,B,-B]. 

Figure  4.11.  Algorithm  for  Generating  Conforming  Directions  (adapted  from  [1] 
and  [81]). 
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Once  an  iteration  has  been  deemed  successful  or  unsuccessful  and  the  appropriate  mesh 
updates  are  completed,  the  final  decision  to  be  made  before  initiating  another  iteration  is 
whether  or  not  to  terminate  the  algorithm.  For  this  purpose,  criteria  (4.7)  -  (4.9)  may 
be  assessed  for  compliance  against  user  defined  thresholds.  A  word  of  caution  is  repeated 
here  that  was  introduced  in  Section  4.3.  If  the  parameter  decays  too  fast,  then  very 
small  values  may  force  large  per-iteration  samples  before  the  step  size  decreases  to  a 
sufficiently  small  value.  In  the  absence  of  adaptive  decay  rates  for  or  it  may  be  pru¬ 
dent  to  make  criterion  (4.9)  optional  so  that,  when  the  per-iteration  sampling  requirements 
become  enormous  prior  to  <  At,  the  algorithm  may  be  stopped. 

4.4-3  Algorithm  Parameters 

The  algorithm  design  may  be  concluded  by  summarizing  the  various  parameter  settings 
required.  For  reasons  discussed  in  Section  4.3,  perhaps  the  most  critical  parameters  with 
regard  to  performance  are  the  R&S  parameters  6r  and  These  parameters  have  the 
most  influence  over  the  number  of  samples  required  for  each  R&S  procedure  executed  by 
the  algorithm.  In  practice,  it  is  desirable  to  avoid  excessive  sampling  in  regions  of  the 
search  space  far  from  optimality.  An  advantage  of  the  MGPS-RS  algorithms  is  that  through 
manipulation  of  these  parameters,  the  sampling  requirements  can  be  increased  gradually  as 
the  algorithm  progresses,  so  that  excessive  sampling  effort  is  not  wasted  at  early  iterations. 
In  this  study,  each  parameter  is  reduced  geometrically  with  r, 

Sr  =  do{pgY  and  =  ao{pY)'’. 

The  initial  values  (5o  and  oq  are  set  very  loose  so  that,  in  the  early  iterations,  no  samples 
are  taken  beyond  the  initial  sq  required  for  each  candidate  in  all  three  procedures  used. 
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As  the  algorithm  progresses,  error  control  of  iterate  selection  increases  as  the  search  moves 
toward  the  region  of  optimality. 

The  adjustable  algorithm  parameters  are  summarized  in  Table  4.1.  The  parameters 
may  be  grouped  into  three  general  categories: 

•  mesh  defining  parameters  D,  Aq,  r,  and 

•  R&S  parameters  So,  ao,  Ps,  Pa^  sq;  and 

•  surrogate  defining  parameters  p,  strength,  range,  9i,  and  [hiow,  hhigh]- 

Some  additional  parameters  are  implicitly  defined  by  parameters  in  the  table.  For  ex¬ 
ample,  the  number  of  design  sites  Ni  to  build  the  initial  surrogate  are  defined  as  the  prod- 


Table  4.1.  Summary  of  MGPS-RS  parameters. 


Parameter 

Description 

D 

Direction  set  used  for  mesh  definition,  must  be  positive  spanning; 
common  choices  are  D  =  [/,  — /]  and  D  =  [/,  — e]  where  e  is  a 
vector  of  ones 

Aq 

Initial  step  size,  must  satisfy  Aq  >  0 

T 

Mesh  update  parameter,  constant  for  all  k, 
must  satisfy  r  >  1  and  r  G  Q 

Mesh  refinement  parameter,  must  satisfy  — oo  <  <  —  1 

and  mjT  G  Z,  can  vary  by  iteration 

m+ 

Mesh  coarsening  parameter,  must  satisfy  0  <  <  -|-oo 

and  G  Z,  can  vary  by  iteration 

Extended  poll  trigger  and  lower  bound  on  trigger, 
must  satisfy  ^^>^>0,  can  vary  by  iteration 

<^0 

Initial  indifference  zone  setting,  must  satisfy  (io  >  0 

ao 

Initial  significance  level  setting,  must  satisfy  0  <  oq  <  1 

Ps 

Indifference  zone  decay  parameter,  must  satisfy  0  <  <  1 

Pa. 

Significance  level  decay  parameter,  must  satisfy  0  <  <  1 

So 

Number  of  response  samples  for  R&S  procedure  initial  stage, 
must  satisfy  sq  >  2 

p 

Number  of  intervals  for  each  continuous  dimension  in  LHS  design, 
must  satisfy  l<p<oo,  pGZ 

strength 

Strength  of  LHS  design,  small,  positive  integer  {e.g.  1  or  2) 

range 

Limits  on  sampling  region  for  LHS  design 

9i 

Factor  to  determine  initial  setting  of  space-filling  parameter  Aj, 
must  satisfy  9i>  0 

[hlow;  hhigh] 

Allowable  range  on  bandwidth  parameter  hi, 
must  satisfy  0  <  /iiow  <  hiow  <  oo 
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uct  of  the  strength  of  the  latin  hypercube  sampling  design  and  the  number  of  intervals  p 
selected  for  each  continuous  dimension.  Another  example  is  the  initial  space-filling  parame¬ 
ter  Aj,  which  is  defined  as  9i  times  the  maximum  difference  in  mean  response  between  any 
two  design  sites  in  the  initial  LHS  design  for  discrete  variable  combination  i.  The  values 
for  hj,  i  =  1, . . .  ,imax  are  determined  by  the  leave-one-out  cross-validation  method  in  the 
calibration  routine,  but  must  be  within  bounds  [hiow,  hhigh]-  The  bounds  [/iiow;  ^high]  are 
necessary  to  prevent  overfitting  or  underfitting  the  surrogate  function  to  the  design  sites. 
Values  of  hi  that  are  too  small  will  cause  overfitting  in  the  sense  that  the  surface  of  the 
surrogate  function  will  pass  through  or  very  close  to  the  response  value  at  each  design  site 
with  very  sharp  drop-offs  in  between  sites;  this  results  from  too  much  weight  given  to  the 
nearest  design  site.  Values  of  hi  that  are  too  large  will  cause  underfitting  in  the  sense  that 
the  surface  of  the  surrogate  function  passes  further  way  from  the  response  value  at  each  de¬ 
sign  site  with  a  more  gradual  slope  between  sites;  this  results  from  too  much  weight  given 
to  designs  sites  far  from  the  nearest  site.  An  illustration  of  underfitting  and  overfitting  was 
shown  in  Figure  4.4. 

4 . 5  Summary 

Building  on  the  framework  presented  in  Chapter  3,  this  chapter  added  detail  regarding 
the  various  algorithm  implementations  with  specific  regard  to  computational  concerns.  The 
use  of  modern  R&S  techniques  to  improve  sampling  efficiency  and  use  of  surrogate  functions 
to  accelerate  convergence  were  highlighted  as  general  approaches  to  enhance  computational 
performance  of  the  basic  algorithm.  Additionally,  a  strategy  for  algorithm  termination  was 
proposed  which  seeks  to  detect  the  onset  of  excessive  sampling  requirements  and  avoid  addi¬ 
tional  sampling  if  only  marginal  improvement  is  expected.  In  the  next  chapter,  the  impact 
of  the  various  implementations  is  assessed  in  a  comprehensive  computational  evaluation. 
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Chapter  5  -  Computational  Evaluation 

A  computational  evaluation  was  conducted  to  assess  the  performance  of  the  various 
implementation  strategies  presented  in  the  preceding  chapter.  This  evaluation  consisted  of  a 
series  of  experiments  that  applied  the  different  algorithm  variants  to  a  suite  of  standardized 
test  problems.  To  complement  the  evaluation,  four  additional  algorithms  from  the  literature 
were  implemented  in  order  to  compare  their  performance  to  the  MGPS-RS  algorithms. 
Following  an  overview  of  the  test  scenario  in  Section  5.1,  the  competing  algorithms,  and 
their  implementation  details  under  this  computational  study,  are  presented  in  Section  5.2. 
The  test  problems  used  for  the  evaluation,  which  consist  of  twenty-two  continuous- variable 
and  four  mixed-variable  problems,  are  described  in  Section  5.3.  In  Section  5.4,  the  design 
of  experiments  is  presented  which  defines  the  performance  measures,  outlines  the  statistical 
model  to  be  evaluated,  and  lists  the  parameter  settings  for  each  of  the  algorithms.  The 
numerical  results  are  analyzed  in  Section  5.5.  Special  attention  is  given  to  a  study  of  the 
effects  of  the  various  MGPS-RS  implementation  alternatives,  the  comparison  of  MGPS-RS 
to  its  competitors  used  in  this  evaluation,  and  the  effectiveness  of  the  termination  criteria 
proposed  in  the  preceding  chapter. 

5. 1  Test  Scenario 

To  test  the  algorithm  implementations  of  Ghapter  4,  the  generic  response  function 

F{x)  =  f{x)  +  N{0,  a'^ifix)) 

is  used,  where  N{0,a‘^{f{x))  is  a  normally  distributed,  mean-zero  noise  term  added  to  an 
underlying  true  objective  function.  Standard  test  functions  were  drawn  from  the  literature 
to  compose  f{x),  some  of  which  are  constrained  by  variable  bounds,  linear  constraints,  or 
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both.  A  total  of  26  test  problems  were  defined,  four  of  which  contain  mixed  variables.  The 
test  problems  are  described  in  greater  detail  in  Section  5.3  and  Appendix  A. 

To  compare  two  different  random  noise  scenarios,  the  standard  deviation  of  the  noise 
term  a{f{x))  is  either  proportional  or  inversely  proportional  to  /,  but  bounded  on  the 
range  (0.1, 10): 

=  min  (^10,  \//(x)  -  f{x*)  +  1^  ,  or 

V  \//(a^)  -  f{x*)  +  1 ) 

where  f{x*)  is  the  known  optimal  solution.  These  test  cases  are  referred  to  as  noise  cases  1 
and  2,  respectively.  At  optimality,  ai  =  (72  =  1  but  diverge  to  values  cti  =  10  and  ct2  =  0.1 
for  trial  points  away  from  optimality.  The  noise  cases  were  selected  to  provide  both  a  high 
noise  case  (case  1)  and  a  low  noise  case  (case  2)  and  also  to  demonstrate  that  the  MGPS- 
RS  algorithms  allow  for  the  inclusion  of  modern  R&S  procedures  that  do  not  require  known 
and/or  constant  variance  of  response  samples  across  different  designs. 

5.2  Competing  Algorithms 

Testing  was  performed  for  each  of  the  following  six  MGPS-RS  variants: 

•  MGPS  with  Rinott’s  procedure  and  no  surrogates  (MGPS-RIN), 

•  MGPS  with  Screen-and-Select  procedure  and  no  surrogates  (MGPS-SAS), 

•  MGPS  with  Sequential  Selection  with  Memory  procedure  and  no  surrogates 
(MGPS-SSM), 

•  Surrogate  assisted  MGPS  with  Rinott’s  procedure  (S-MGPS-RIN), 

•  Surrogate  assisted  MGPS  with  Screen-and-Select  procedure  (S-MGPS-SAS), 
and 

•  Surrogate  assisted  MGPS  with  Sequential  Selection  with  Memory  procedure  (S- 
MGPS-SSM). 

For  comparison  to  other  methods,  four  additional  algorithms  were  included  in  computa¬ 
tional  experiments: 
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•  Finite- Difference  Stochastic  Approximation  (FDSA), 

•  Simultaneous  Perturbation  Stochastic  Approximation  (SPSA), 

•  Nelder-Mead  simplex  search  (NM),  and 

•  a  Random  Search  (RNDS)  algorithm. 

(r) 

Each  of  the  ten  methods  was  coded  in  the  MATLAB^  programming  language.  The 

MGPS-RS  implementations  are  described  in  Section  4.4.  Code  for  FDSA  and  SPSA  was 

obtained  from  the  web  site  associated  with  the  textbook  of  Spall  [134]  and  modified  as 

(R) 

necessary.  Code  for  NM  was  adapted  from  the  MATLAB^  function  fminsearch.  Code  for 
RNDS  was  written  by  the  author.  The  details  of  the  algorithms  are  provided  in  the  following 
paragraphs.  Due  to  algorithm  limitations,  FDSA,  SPSA  and  NM  were  not  applied  to  the 
test  problems  with  mixed  variables.  In  addition,  NM  was  not  applied  to  the  constrained 
test  problems.  The  RNDS  algorithm  was  adapted  for  all  test  problem  types. 

The  FDSA  and  SPSA  methods  are  based  on  the  algorithm  in  Figure  2.1  using  (2.2) 
to  estimate  the  gradient  for  FDSA  and  (2.3)  for  SPSA.  As  recommended  by  Spall  [134, 
p.ll3],  the  step  sizes  for  FDSA  and  SPSA  are  updated  according  to 

~  {k  +  l  +  Asa)“sa 


with  constant  scalar  parameters  a  >  0,  asA  >  0,  and  Asa  >  0.  The  parameter  Asa  >  0 
is  designed  to  provide  stability  in  the  early  iterations  when  a  is  large  enough  to  ensure 
nonnegligible  step  sizes  after  many  iterations. 

The  perturbation  distance  parameter  Cfc,  used  by  (2.2)  and  (2.3)  to  specify  interval 
width  in  the  gradient  approximation,  is  updated  as  per  Spall  [134,  p.  163] 


Cfc  = 


{k  +  lpsA 


(5.2) 
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with  constant  scalar  parameters  c  >  0  and  7gA  >  0.  Spall  [134,  p.  190]  suggests  setting 
c  equal  to  the  approximate  standard  deviation  of  the  response  noise.  For  nearly  all  test 
problems,  the  setting  c  =  1  is  used.  In  the  remaining  ones,  a  smaller  value  was  needed  to 
prevent  the  formulae  in  (2.2)  and  (2.3)  from  evaluating  the  objective  at  infeasible  points, 
where  it  is  not  defined.  These  test  cases  are  elaborated  on  further  in  Section  5.4. 

The  forms  of  (5.1)  and  (5.2)  ensure  that  the  iterates  of  the  SA  algorithms  are  asymp¬ 
totically  normally  distributed  about  the  optimal  solution  [134,  p.  162],  which  provides  a 
means  to  determine  rates  of  convergence.  It  can  be  shown  that  the  optimal  asymptotic  rate 
of  convergence  is  achieved  for  settings  of  agA  =  1  and  yg^  =  1/6.  However,  for  better  finite- 
sample  algorithm  performance.  Spall  [134,  p.  190]  recommends  using  values  agA  =  0.602 
and  7gA  =  0.101,  which  are  the  lowest  possible  settings  that  satisfy  the  theoretical  condi¬ 
tions  necessary  to  retain  normally  distributed  iterates.  These  settings  are  used  throughout 
the  computational  testing.  For  SPSA,  the  elements  of  the  perturbation  direction  vector  dk 
are  drawn  randomly  from  a  Bernoulli  ±1  distribution  with  probability  ^  for  each  outcome. 
Each  point  in  the  differencing  formula  is  averaged  over  so  =  5  response  samples  for  both 
FDSA  and  SPSA. 

As  suggested  by  Spall  [134,  p.  165],  AgA  is  selected  to  be  approximately  10%  of  the 
total  number  of  iterations.  Therefore,  for  RSma.x  total  response  samples  allowed  for  each 
algorithm  run,  AgA  is  determined  as 


^SA 

^SA 


0.1 

0.1 


^'S'max 

2n'^so 

RSma:ii 

2so 


for  FDSA,  and 
for  SPSA. 


Given  AgA  and  an  initial  desired  step  size  oq,  the  constant  a  is  selected  semiautomatically 
after  an  initial  NS  number  of  response  samples  obtained  at  the  starting  point  according  to 
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the  methods  suggested  in  [134,  p.  165,  190].  For  SPSA,  a  is  determined  as 


ao  (Asa  +  1)“^^ 

“  5o(^o)  ’ 

where  yo(^o)  is  the  mean  of  the  estimated  gradient  vector  elements  averaged  over  the  NS 
responses.  For  FDSA,  a  is  determined  as  a  =  min{atemp,i;  atemp,2,  •  •  • ,  otemp.n^},  where 


(Asa  + 
9o,i{^o) 


and  yo,i(-Ao)  is  the  estimated  gradient  vector  element  for  coordinate  i  averaged  over  the 
NS  responses.  The  value  NS  =  200  is  used  for  SPSA  and  NS  =  max(200, 2n‘^so)  for 
FDSA.  A  larger  number  of  samples  is  used  for  FDSA  (for  test  problems  exceeding  20 
variables)  because  FDSA  uses  more  samples  per  gradient  estimate  as  rf  increases,  whereas 
SPSA  always  uses  2so  samples.  Given  the  preceding  discussion  for  determining  parameter 
settings,  the  only  parameter  left  that  requires  tuning  for  both  FDSA  and  SPSA  prior  to 
algorithm  execution  is  the  initial  desired  step  size  oq  (except  for  c  in  just  three  of  the  test 
problems) . 

For  problems  with  variable  bounds,  elements  of  infeasible  iterates  i  =  1, . . .  .rf 
produced  by  FDSA  and  SPSA  are  set  to  k  (ui)  for  lower  (upper)  bound  violations.  Handling 
the  linear  constraints  is  a  bit  more  complicated,  however.  In  these  cases,  a  mapping  to 
the  feasible  region  was  attempted  through  a  sequence  of  corrective  moves  in  the  negative 
direction  of  the  outward  pointing  normal  vector  to  the  maximum  violated  constraint.  This 
simple  technique  is  illustrated  for  two  dimensions  in  Figure  5.1,  which  shows  two  cases  in 
which  the  two  constraints  /i  and  /2  are  violated.  On  the  left  side  of  the  figure,  a  single  move 
is  required  to  map  iterate  to  feasible  point  A(..  On  the  right  side  of  the  figure,  an  infinite 
number  of  moves  are  actually  needed  to  reach  the  intersection  of  /i  and  /2  in  the  limit.  To 
avoid  computational  deficiencies  resulting  from  a  very  large  number  of  attempted  moves. 
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Figure  5.1.  Illustration  of  Corrective  Move  Method  for  Infeasible  Iterates  Used 
in  SA  Algorithms. 

the  number  of  moves  is  limited  to  50  per  iteration  in  algorithm  implementation.  Thus,  in 
the  presence  of  multiple  linear  constraints,  the  corrected  iterates  are  not  guaranteed  to  be 
feasible  but  will  be  closer  to  the  feasible  region  than  the  pre-correction  iterates. 

The  Nelder-Mead  algorithm  is  based  on  the  algorithm  listing  in  Figure  2.3.  As  sug¬ 
gested  by  Barton  and  Ivey  [23],  the  shrink  parameter  was  adjusted  to  k  =  0.9  (from  0.5), 
while  all  others  were  set  according  to  the  standard  choices  [152]:  reflection  parameter  r?  =  1, 
expansion  parameter  =  2,  and  contraction  parameter  (3  =  ^.  In  addition,  the  best 
point  is  resampled  after  a  shrink.  Any  time  a  point  in  the  simplex  is  sampled  for  the  first 
time  or  resampled,  the  objective  function  value  is  evaluated  as  the  mean  response  over 
So  =  5  samples.  The  initial  simplex  is  constructed  using  the  starting  point  plus  rf  points 
a  distance  of  Anm  units  in  the  direction  of  the  coordinate  axes  from  the  starting  point. 
The  parameter  Anm  was  tuned  for  each  of  the  test  problems  to  try  and  find  an  initial  sim¬ 
plex  size  that  allowed  the  search  to  achieve  the  best  results.  Details  of  parameter  tuning 
procedures  are  provided  in  Section  5.4.2. 
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The  Random  Search  algorithm  is  based  on  the  general  algorithm  of  Figure  2.2,  adapted 
to  a  mixed-variable  domain.  The  specific  algorithm  used  in  the  computational  evaluation 
is  shown  in  Figure  5.2.  The  neighborhood  structure  for  the  continuous  domain  is  based 
on  the  algorithm  in  [134,  p.  45],  which  is  a  simplification  of  an  algorithm  in  [131].  In 
Step  1,  the  continuous  portion  of  a  trial  point  is  generated  by  perturbing  the  incumbent 
(X^)'  =  +  hk  +  dk  where  is  a  bias  vector  and  dk  is  a  normally  distributed  random 

perturbation  vector  with  mean  zero  vector  and  covariance  The  parameter  allows 

the  standard  deviation  of  the  perturbation  terms,  which  is  set  equal  for  all  dimensions,  to 
be  adjusted  by  iteration.  In  the  algorithm  of  Figure  5.2,  the  value  is  reduced  by  a  factor  of 
0.99  after  each  iteration  so  that,  after  many  iterations,  the  magnitude  of  the  perturbation 
dk  gets  smaller  as  the  optimum  is  approached.  The  initial  standard  deviation  parameter 
Pq  was  tuned  for  each  of  the  test  problems  to  try  and  find  an  initial  setting  that  allowed 
the  search  to  achieve  the  best  results.  Details  of  parameter  tuning  procedures  are  provided 
in  Section  5.4.2.  The  bk  vector  slants  the  search  of  a  candidate  in  the  direction  of  previous 
success.  The  discrete  portion  of  a  trial  point  is  randomly  assigned  by  selecting  a  combination 
of  discrete  variable  settings  i  uniformly  from  all  the  possible  settings  and  using  the  values 
for  that  correspond  to  i. 

In  Step  2  of  the  algorithm,  the  mean  responses  of  the  incumbent  and  trial  points  are 
averaged  over  k  samples.  Therefore,  precision  in  estimating  the  objective  function  increases 
with  k  so  that,  early  in  the  search,  exploration  of  the  design  space  is  encouraged  due  to  a 
greater  likelihood  of  accepting  trial  points  even  if  the  true  objective  value  does  not  improve 
upon  the  incumbent.  As  the  number  of  iteration  grows,  it  becomes  increasingly  difficult  to 
replace  good  iterates  because  of  increased  precision  in  the  estimates.  If  the  first  trial  does 
not  successfully  replace  the  incumbent,  then  a  second  is  tried  by  reversing  the  direction  of 
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Random  Search  Algorithm 

Initialization:  Set  5o  =  0  and  initial  setting  for  pQ  >  0.  Choose  a  feasible  starting  point  Xq  g  0. 

Set  the  iteration  counter  k  to  0. 

1.  Generate  independent  random  vector  dk  ~  NORMAL{0,pj^I)  and  independent  random  integer 

i  ~  Construct  trial  point  (A^)')  according  to  following: 

“  =  X^  +  h  +  dk,  and 

—  {X^y  =  {X‘^)i  corresponding  to  the  fth  combination  of  discrete  variables. 

2.  Obtain  k  response  samples  {Fs{Xk)}g^i  for  the  incumbent  design  point  and  calculate  mean 
response  F{Xk)  =  k^^  Fs{Xk). 

-  If  A^  is  infeasible,  set  F{X'^)  =  oo;  otherwise, obtain  k  response  samples 

for  the  trial  design  point  and  calculate  mean  response  A(A^)  =  Z]s=i  dd's{Xy).  If 
F(A^)  <  F(Afc),  set  A^+i  =  A^  and  bk+i  =  0.26^  +  0.4dfc  and  go  to  Step  3. 

—  Set  (A^)'  =  A^  +  bk  —  dk  and  keep  current  (A^)'.  If  A^  is  infeasible,  set  A(A^)  =  oo; 
otherwise,  obtain  k  response  samples  {Kj(A^)}J^;^  for  the  trial  design  point  and 
calculate  mean  response  F{X'^)  =  k^^  Yl^=i  d^s{X'i.).  If  F{X'^)  <  F{Xk),  set  A^+i  = 
X'f,  and  bk+i  =  bk  —  OAdk  and  go  to  Step  3. 

-  Set  Afc+i  =  Afc  and  6^+1  =  0.56^ 

3.  If  the  stopping  criteria  is  satisfied,  then  stop  and  return  A^+i  as  the  estimate  of  the  optimal 
solution.  Otherwise,  update  Pk+i  =  0.99py  and  k  =  k  +  1  and  return  to  Step  1. 


Figure  5.2.  Random  Search  Algorithm  Used  in  Computational  Evaluation 

the  perturbation  vector  dk-  If  neither  trial  replaces  the  incumbent,  the  bias  vector  is  halved 
before  returning  for  another  iteration.  Note  that  response  samples  for  infeasible  points  are 
avoided  by  setting  the  response  mean  value  to  a  very  large  number  such  that  it  cannot  be 
accepted  as  the  new  iterate. 

5.3  Test  Problems 

The  test  problems  are  drawn  from  standardized  problem  sets  in  [58]  and  [122].  In 
total,  twenty-six  test  problems  are  used  for  the  computational  evaluation  -  twenty-two  of 
them  with  continuous  variables  only  and  four  of  them  with  mixed  variables.  The  mixed- 
variable  problems  are  constructed  similarly  to  the  example  of  Section  3.8  in  that  f{x)  takes 
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a  specific  functional  form  depending  on  the  settings  of  the  discrete  variables.  The  following 
subsections  summarize  the  test  problems. 

5. 3. 1  Continuous-  Variable  Problems 

The  continuous- variable  test  problems  are  drawn  from  the  published  collections  in  [58] 
and  [122] ,  the  latter  being  a  supplement  to  the  former.  Taken  together,  these  books  present 
a  total  of  307  problems  as  “an  extensive  set  of  nonlinear  programming  problems  that  were 
used  by  other  authors  in  the  past  to  develop,  test  or  compare  optimization  algorithms” 
[122,  p.  iii].  The  collection  consists  of  unconstrained  and  constrained  problems  that  range 
in  dimension  from  two  to  100.  A  nice  feature  of  the  publications  is  that  they  provide  a 
classification  scheme  to  help  characterize  the  structure  of  each  problem.  The  objective 
function  (OBJ)  is  classified  as  one  of  the  following  categories: 

•  constant  (C), 

•  linear  (L), 

•  quadratic  (Q), 

•  sum  of  squares  (S), 

•  generalized  polynomial  (P),  or 

•  general  nonlinear  (G). 

The  constraint  information  is  classified  as  one  of  the  following  categories: 

•  unconstrained  (U), 

•  upper  and/or  lower  bounds  only  (B), 

•  linear  constraint  functions  (L), 

•  quadratic  constraint  functions  (Q), 

•  generalized  polynomial  constraint  functions  (P),  or 

•  general  nonlinear  constraint  functions  (G). 

In  this  dissertation  research,  an  objective  of  the  test  problem  set  is  that  it  represents  a 
cross  section  of  the  relevant  objective  function  and  constraint  category  combinations.  Since 
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the  MGPS-RS  algorithms  are  applicable  in  an  unconstrained  setting  or  under  bound  and 
linear  constraints,  the  constraint  categories  available  for  testing  are  U,  B,  and  L.  In  addi¬ 
tion,  very  few  of  the  problems  with  linear  objective  functions  have  only  linear  constraints 
(none  that  are  unconstrained  or  bounded  only);  therefore,  objective  function  categories  are 
restricted  to  Q,  S,  P,  and  G.  Finally,  it  is  deemed  important  to  stratify  algorithm  perfor¬ 
mance  based  on  problem  size.  Therefore,  an  additional  category  considered  in  this  research 
was  established  based  on  problem  dimension.  The  categories  are  defined  as: 

•  small  (S)  -  2  to  9  variables, 

•  medium  (M)  -  10  to  29  variables,  or 

•  large  (L)  -  30  to  100  variables. 

Note  that  the  “large”  category  here  is  not  necessarily  representative  of  large  practical  prob¬ 
lems,  which  may  include  thousands  of  design  variables. 

With  four  objective  function  categories,  three  constraint  type  categories,  and  three 
problem  size  categories,  the  total  number  of  problem  type  combinations  numbers  4  x  3  x 
3  =  36.  Of  the  thirty-six  combinations,  twenty-two  are  satisfied  by  at  least  one  of  the 
published  problems,  which  led  to  the  conclusion  to  select  twenty-two  continuous-variable 
test  problems. 

A  summary  of  the  continuous- variable  test  problems  is  displayed  in  Table  5.1.  The 
problem  number  shown  is  the  number  assigned  in  [58]  or  [122].  The  table  lists  the  objective 
function  type,  problem  dimension  (and  size  category),  and  constraint  information  to  include 
the  number  of  bounds  and/or  linear  constraints.  A  more  detailed  description  of  the  test 
problems,  included  in  Appendix  A,  provides  the  objective  and  constraint  equations,  the 
starting  solution,  and  the  optimal  solution.  It  should  be  noted  that  the  published  starting 
point  was  used  for  each  problem  except  for  problem  392.  In  this  case,  the  published  starting 
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Table  5.1.  Continuous- Variable  Test  Problem  Properties. 


Problem 

Number 

OBJ 

CON 

DIM 

Number  of 
Bounds 

Number  of 
Linear 
Constraints 

3 

Q 

B 

2(S) 

1 

0 

4 

Q 

B 

2(S) 

2 

0 

5 

B 

2(S) 

4 

0 

25 

s 

B 

3(S) 

6 

0 

36 

p 

L 

3(S) 

6 

1 

L 

8(S) 

16 

1 

B 

10  (M) 

20 

0 

118 

Q 

L 

15  (M) 

30 

29 

224 

Q 

L 

2(S) 

4 

4 

244 

s 

U 

3(S) 

0 

0 

p 

U 

4(S) 

0 

0 

275 

Q 

u 

4(S) 

0 

0 

281 

u 

mbwmm 

0 

0 

287 

p 

u 

WAtwmm 

0 

0 

288 

s 

u 

20  (M) 

0 

0 

289 

u 

30  (L) 

0 

0 

297 

s 

u 

30  (L) 

0 

0 

Q 

u 

20  (M) 

0 

0 

301 

Q 

u 

50  (L) 

0 

0 

305 

p 

u 

100  (L) 

0 

0 

314 

u 

2(S) 

0 

0 

Q 

L 

30  (L) 

45 

30 

point  is  infeasible.  Since  MGPS-RS  algorithms  search  the  interior  of  the  feasible  region, 
the  starting  point  was  modified  from  the  published  version  to  make  it  feasible. 


5.3.2  Mixed- Variable  Problems 

The  mixed-variable  problems  are  constructed  by  assigning  a  specific  functional  form 
to  the  objective  function  over  the  continuous  domain  /(x^)  based  on  the  settings  of  the 
discrete  variables  x‘^.  To  simplify  test  problem  construction,  one  discrete  variable  is  used 
with  a  varying  number  of  settings  imax-  For  testing,  two  settings  of  imax  (2  and  3)  are  used 
as  well  as  two  settings  for  the  dimension  rf  (4  and  20)  to  comprise  a  total  of  four  test 
problem  combinations.  The  following  variably  dimensioned  test  functions  were  selected  so 
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that  their  form  could  be  adjusted  to  the  dimension 

n^-l 

=  ^[{Xe+l-Xgf  +  {l-x^gf], 
e=i 

f2ix'^)  =  5  +  where  t - r — and 

t  +  j-1 

faix^)  =  2  +  2071^^  —  5  ^  Xg. 

e=i 

The  first  two  functions  are  variations  on  functions  presented  in  [122].  Function  fi 
is  a  version  of  the  well-known  Rosenbrock  banana  function,  several  versions  of  which  are 
presented  in  [122]  (problems  206-210,  294-299).  Function  /2  is  a  quadratic  function  using 
the  Hilbert  matrix  to  generate  the  coefficients  on  the  terms;  three  versions  of  this  function 
are  presented  in  [122]  (problems  274-276).  The  scalar  multipliers  of  these  functions  were 
adjusted  so  that  their  surfaces  do  not  deviate  from  each  other  too  much  in  the  feasible 
design  space  {e.g.  the  scalar  “5”  was  introduced  as  a  multiplier  to  the  {x'^Y' Qx^  term).  In 
addition,  a  constant  term  was  added  to  function  /i  so  that  its  minimum  value  does  not 
coincide  with  that  of  fi.  Both  functions  fi  and  /2  are  used  to  define  the  objective  function 
when  iraax  =  2;  the  linear  function  /s  is  added  when  the  value  of  Zmax  is  increased  from  2 
to  3. 

The  four  mixed-variable  test  problems  are  summarized  in  Table  5.2.  For  each  of  the 
problems,  all  continuous  variables  are  bound  on  the  range  [—4, 4].  The  lone  discrete  variable 
has  either  two  (MVPl  and  MVP3)  or  three  (MVP2  and  MVP4)  settings,  which  determine 
the  form  taken  by  the  objective  function.  For  all  test  problems,  the  optimal  solution 
corresponds  to  x*  =  (xf ,  X2, . . . ,  xjjc,  x'^)*  =  (0, 0, . . .  0, 1),  /(x*)  =  0.  The  continuous 
portion  of  the  starting  point  was  selected  as  the  standard  starting  point  for  problems  274- 
276  in  [122],  i.e.,  x^  =  i  =  1, . . .  The  discrete  portion  was  selected  as  x'^  =  2  or 

x'^  =  3.  The  test  problems  are  shown  graphically  in  Figure  5.3  for  a  two-dimensional  case. 
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Table  5.2.  Mixed-variable  Test  Problems. 


Test 

Problem 

^max 

Objective  function 

DIM(n‘=) 

Bounds 
on  x'^ 

Starting 

Point 

MVPl 

2 

/i(x‘^),  if  =  1 
/2(x‘^),  if  x'^  =  2 

4 

[-4,4] 

Xg  =  4/£, 

x<^  =  2 

MVP2 

3 

f{x^)  =  < 

/i(x‘=),  ifx"'  =  l 

/2(x‘=),  if  x*^  =  2 

^  fsix^"),  if  x*^  =  3 

4 

[-4,4] 

Xf  =  4/£, 
£=!,.. .,4 
x-^  =  3 

MVP3 

2 

=  < 

/i(x‘=),  if  x'^  =  1 
/2(x‘^),  if  x'^  =  2 

20 

[-4,4] 

x^  =  4/£, 

£=  1,...,20 
x<^  =  2 

MVP4 

3 

f{x^)  =  < 

/i(x‘=),  ifx"‘  =  l 

/2(x‘=),  if  x*^  =  2 

^  /3(x‘'),  if  x*^  =  3 

20 

[-4,4] 

x\  =  4/£, 

£=  1,...,20 
x-^  =  3 

Figure  5.3.  Mixed-variable  Test  Problem  Illustration  for  =  2. 

5.4  Experimental  Design 

The  computational  experiments  constitute  a  full  factorial  design,  where  for  each  valid 
algorithm,  test  problem,  and  noise  case  combination,  thirty  independent  replications  were 
executed.  In  each  experiment  the  algorithm  was  allowed  to  run  until  RSmax  =  100, 000 
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response  samples  were  obtained  (small  and  medium  problems)  or  until  RSmayc  =  500, 000 
response  samples  were  obtained  (large  problems  and  problems  MVP3  and  MVP4). 


5.4-1  Performance  Measures  and  Statistical  Model 

Three  performance  measures  are  defined  to  evaluate  the  numerical  results.  Since  the 
experiments  were  executed  across  a  number  of  different  PC-based  platforms,  the  perfor¬ 


mance  measures  do  not  include  computer  processing  time,  which  is  not  always  a  consistent 
indicator  of  algorithm  quality  even  on  a  standard  platform.  The  following  performance 
measures  are  used,  where  Q  and  P  are  used  in  the  comparison  of  all  algorithms  and  SW 
is  used  to  compare  the  MGPS-RS  variants  to  each  other.  In  the  performance  measure  de¬ 
finitions,  x*{f*)  refers  to  the  optimal  design  vector  (objective  function  value),  xo(/o)  to 
the  starting  design  vector  (objective  function  value),  and  x{f)  to  the  final  design  vector 
(objective  function  value)  after  the  search. 


The  measures  Q  and  P  are  scaled  by  dividing  by  the  absolute  difference  between 
the  starting  and  optimal  values,  thereby  providing  a  dimensionless  quantity  that  allows 
consistency  in  comparisons  across  test  problems.  In  the  mixed-variable  case,  the  measure 
P  is  defined  as  it  is  so  that  if  the  discrete  variable  has  not  reached  the  optimal  setting  at 
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termination,  the  numerator  is  penalized  by  one  unit  regardless  of  which  suboptimal  setting 
it  may  have.  This  is  in  keeping  with  the  concept  that  categorical  variables  have  no  ordering 
so  that  a  setting  of  “3”  should  not  be  perceived  as  further  than  “2”  from  an  optimal  value 
of  “1”. 

For  each  experiment  involving  a  MGPS-RS  variant,  the  following  statistical  model  is 
postulated  for  performance  measure  Q, 


QrM  =  ^0  +  ^rWr.+^sWs,+^nWn,+^rsWrWs, 


-\-I^RN^Ri^Nk  + +  ^ijke  (5-3) 


where  1  <  £  <  30  is  the  replication  index,  1  <  fc  <  2  is  the  noise  case  index,  1  <  j  <  2  is 


the  surrogate  index,  1  <  i  <  3  is  the  R&S  index,  and  “coded”  independent  variables  Wr.  , 


Ws-,  and  Wn^  represent  experimental  design  factors  for  R&S,  use  of  surrogates,  and  noise 


case,  respectively.  The  design  factors  are  defined  as 


Wr, 


Wn^ 


—  1,  for  i 
0,  for  i 
+1,  for  i 


f  -1,  for  j 
\  +1,  for  j 

J  —  1,  for  k 
1  +1,  for  k 


1  (RIN), 

2  (SAS), 

3  (SSM); 

1  (no  surrogates), 

2  (surrogates); 

1  (noise  case  1), 

2  (noise  case  2). 


and 


A  similar  model  is  postulated  for  performance  measure  P. 


5.4-2  Selection  of  Parameter  Settings 

To  avoid  excessive  parameter  tuning,  a  subset  of  parameter  settings  for  all  algorithms 
are  kept  constant  throughout  the  experiments.  For  the  SA  algorithms  and  NM,  the  rec¬ 
ommended  settings  discussed  in  Section  5.2  are  used  when  appropriate.  Parameter  tuning 
cannot  be  completely  avoided,  however,  and  those  parameters  that  are  adjusted  for  each 
continuous- variable  test  problem  are  identified  with  an  entry  of  “tune”  in  Table  5.3,  which 
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Table  5.3.  Parameter  Settings  for  All  Algorithms  —  Continuous-variable  Prob¬ 
lems. 


Parameter 

Setting 

Parameter 

Setting 

MGPS-RS 

Do 

ir  -n 

(^0 

100 

T 

2 

ao 

0.8 

Aq 

tune 

Ps 

0.95 

ml 

-1 

Pa 

0.95 

0 

So 

5 

P 

10 

e 

10 

strength 

[^low)  ^high] 

2 

[.1,3] 

range 

tune 

FDSA 

asA 

0.602 

c 

]^(Note  1) 

7sa 

0.101 

So 

5 

NS 

Oq 

depends  on  N 
tune 

depends  on  N  and  RS'max 

SPSA 

asA 

0.602 

c 

^(Note  2) 

7sa 

0.101 

So 

5 

NS 

ao 

200 

tune 

Mk 

depends  on  RS'max 

NM 

K 

0.9 

7nm 

2 

1 

1 

P 

0.5 

Anm 

tune 

So 

5 

RNDS 

7T 

Po 

tune 

Note  1:  c  =  .01  (problem  105)  and  c  =  .5  (problem  110) 

Note  2:  c  =  .25  (problem  25),  c  =  .01  (problem  105)  and  c  =  .5  (problem  110) 


summarizes  the  parameter  settings  for  all  algorithms.  Note  that  alternative  settings  for  the 
FDSA  and  SPSA  parameter  c  are  used  for  some  test  problems.  Each  of  the  test  problems 
requiring  this  change  involves  evaluating  a  logarithm  in  the  objective  function,  and  the  pa¬ 
rameter  modification  was  necessary  to  ensure  that  the  gradient  estimator  does  not  try  to 
take  the  logarithm  of  a  negative  number  resulting  from  a  larger  setting  for  c. 

Parameter  tuning  for  the  “tunable”  parameters  was  carried  out  informally  by  running 
each  algorithm  for  a  few  thousand  response  samples  and  observing  the  output.  Care  was 
taken  to  ensure  that  the  SA  algorithms  did  not  diverge  or  seem  unstable,  as  they  are  prone 
to  do  if  oo  is  set  too  large,  and  that  the  MGPS-RS  variants,  RNDS  and  NM  seemed  to 
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Table  5.4.  Summary  of  “Tunable”  Parameter  Settings  for  all  Algorithms  —  Con¬ 
tinuous-variable  Problems. 


Problem 

MGPS-RS 

FDSA 

SPSA 

NM 

RNDS 

Number 

Aq 

range 

ao 

^SA 

NS 

ao 

^SA 

Anm 

Po 

3 

.5 

10 

.1 

HI 

mm 

.1 

1000 

NA 

1.0 

4 

.25 

2.5 

.1 

mm 

.02 

1000 

NA 

.5 

5 

.5 

bounds 

.1 

mm 

.02 

1000 

NA 

1.0 

25 

2.0 

bounds 

mm 

mm 

.05 

1000 

NA 

2.5 

36 

1.0 

bounds 

mm 

200 

.01 

1000 

NA 

2.0 

105 

.25 

bounds 

125 

mm 

.001 

1000 

NA 

1.0 

no 

.1 

bounds 

.1 

100 

.03 

1000 

NA 

.25 

118 

4 

bounds 

2 

67 

200 

.25 

1000 

NA 

4 

224 

.5 

bounds 

.5 

500 

mm 

.5 

NA 

.5 

244 

2 

5 

.1 

833 

mm 

.1 

8 

1.5 

256 

1 

5 

1 

250 

200 

.05 

1000 

8 

.5 

275 

1 

2.5 

.25 

250 

200 

.25 

1000 

2 

1 

281 

.5 

2.5 

.1 

100 

200 

.02 

1000 

8 

.5 

287 

1 

4 

2.5 

mm 

1 

1000 

2 

.5 

288 

1 

4 

1 

50 

mm 

.5 

1000 

1 

.5 

289 

.1 

2 

.001 

167 

300 

.01 

5000 

2 

.1 

297 

2 

2.5 

2 

167 

mm 

.3 

■rfllllll 

10 

1 

300 

.1 

1 

.05 

50 

.05 

■riTITM 

500 

.5 

301 

2 

10 

.01 

100 

500 

.008 

5000 

500 

.25 

305 

2 

5 

.3 

50 

1000 

.05 

5000 

2 

2 

314 

.25 

1 

.3 

500 

200 

.05 

1000 

10 

.1 

392 

10 

bounds 

10 

167 

300 

1 

5000 

NA 

10 

achieve  reasonable  progress.  The  final  settings  for  these  parameters  for  each  algorithm  and 
test  problem  are  displayed  in  Table  5.4. 

For  the  mixed- variable  test  problems,  the  parameter  settings  used  for  the  MGPS-RS 
variants  and  RNDS  in  all  experiments  are  displayed  in  Table  5.5.  Two  items  regarding  these 
settings  are  worthy  of  mention.  First,  the  extended  poll  trigger  is  set  to  a  large  value 
during  the  early  iterations  and  then  reset  to  a  smaller  value  after  the  algorithm  conducts 
two  EXTENDED  POLL  steps.  This  ensures  extended  polling  is  conducted  so  that  samples 
are  generated  at  alternate  settings  of  if  the  original  surrogates  do  not,  due  to  surrogate 
inaccuracies,  correctly  predict  improving  designs  during  the  SEARCH  step  at  the  alternate 
settings  (which  would  avoid  such  sampling) .  Secondly,  the  R&S  decay  parameters  and 
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Table  5.5.  Parameter  Settings  for  MGPS-RS  and  RNDS  Algorithms  — 
Mixed-variable  Problems. 


Parameter 

Setting 

Parameter 

Setting 

MGPS-RS 

Do 

<5o 

100 

T 

2 

ao 

0.8 

Ao 

.5 

PS,  (MVP1,2) 

0.95 

m+ 

-1 

PS,  p^  (MVP3,4) 

0.99 

0 

So 

5 

4  (MVP1,2) 

200  (10)”°*'= 

0i 

10 

4  (MVP3,4) 

2000  (20)^°*® 

range 

bounds 

strength 

[^low)  ^high] 

2 

[.1,3] 

P 

10 

RNDS 

2 

Note:  The  notation  X{x)  indicates  that  is  initially  set  to  X  but  reset  to  x  after 
two  EXTENDED  POLL  steps  are  executed. 


Pq,  are  set  to  a  higher  value  (0.99)  for  the  larger  problems  (MVP3  and  MVP4).  This  ensures 
that  the  parameters  ar  and  Sr  decay  at  a  slower  rate  with  the  hope  that  the  algorithm  will 
not  exhaust  an  excessive  portion  of  the  response  sampling  budget  prematurely  through  a 
sequence  of  R&S  procedures  conducted  during  the  EXTENDED  POLL  steps. 

5.5  Results  and  Analysis 

The  computational  experiments  were  run  on  a  total  of  26  different  PC  workstations. 
The  computational  platforms  ranged  in  processing  speed  from  2.00  GHz  to  3.00  GHz,  in 
random  access  memory  from  256  MB  to  2.0  GB,  and  operated  under  the  Windows  2000  or 
Windows  XP  operating  systems.  Three  of  the  platforms  had  dual  processors. 

The  following  sections  summarize  the  quantitative  analysis  of  the  results.  Additional 
test  result  data  are  presented  in  various  forms  in  Appendix  B.  For  example,  to  give  a  visual 
perspective  of  algorithm  progression,  a  series  of  graphs  are  displayed  that  plot  performance 
measure  Q,  averaged  over  30  replications,  versus  the  number  of  response  samples  obtained 
for  each  algorithm,  noise  case,  and  test  problem. 
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5.5.1  Analysis  of  MGPS-RS  Variant  Implementations 

To  evaluate  the  effect  of  the  various  implementation  options  on  performance  mea¬ 
sures  Q  and  P  within  MGPS-RS,  a  formal  analysis  of  variance  (ANOVA)  was  performed 

TM 

on  the  statistical  model  (5.3)  using  the  JMP  5.1  statistical  software  [121].  To  assess 
the  validity  of  the  model,  the  estimated  studentized  residuals  were  examined  using  normal 
probability  plots  and  the  Shapiro-Wilk  test  for  normality  [116,125].  In  most  cases,  the 
data  Qijki  and  Pijkt  required  a  transformation  to  approximately  satisfy  the  normality  and 
constant  variance  assumptions  required  by  the  ANOVA  procedure.  The  commonly  used 
transformations  suggested  in  Montgomery  [93,  p.  84]  were  used,  which  include  the  square 
root,  natural  logarithm,  reciprocal  square  root,  and  reciprocal  transformations.  Even  af¬ 
ter  transformation,  the  Shapiro-Wilk  test  frequently  rejected  the  null  hypothesis  that  the 
residuals  were  normally  distributed  at  the  .05  significance  level,  perhaps  because  the  trans¬ 
formed  residual  distributions  remained  slightly  skewed  and  there  was  a  large  sample  size 
(360  residuals  -  one  from  each  combination  of  6  algorithms,  30  replications,  and  2  noise 
cases).  For  a  large  number  of  sample  points,  the  cumulative  deviation  from  normality,  used 
in  computing  the  test  statistic,  can  be  more  dramatic  than  for  smaller  samples,  causing  the 
test  to  fail.  Furthermore,  it  proved  difficult  to  attain  approximately  constant  variance  of 
the  residuals,  which  was  assessed  graphically  by  plotting  the  residuals  versus  fitted  values. 
The  ANOVA  procedure  is  typically  robust  to  moderate  departures  from  normality  in  the 
residuals  (see  Montgomery  [93,  p.  77]),  and  since  the  sample  size  is  large  and  balanced 
(equal  samples  for  each  of  the  factors  Wr,  Wr,  and  ITat),  then  the  normality  and  constant 
variance  assumptions  may  be  approximately  satisfied.  Included  in  Appendix  B  is  a  listing 
of  transformations  used,  the  results  of  the  Shapiro-Wilk  test,  normal  probability  plots  and 
residual  versus  fitted  values  plots  for  each  test  problem. 
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The  ANOVA  procedure  was  used  to  determine  the  significance  of  the  effects  /3r,  jSg, 
I^Ni  I^RSi  I^RNi  in  model  (5.3).  To  investigate  which  R&S  procedures  led  to  better 

results  in  the  event  that  was  significant,  the  ANOVA  was  followed  up  by  a  multiple 
comparison  test  at  the  .05  significance  level  on  the  transformed  data  to  compare  means 
T(Qi),  T{Q2),  and  T{Qz),  where  T{Qi)  is  defined  as 

^  2  2  30 

m)  =  ^  =  or  3,  (5.4) 

^  j=i  k=i  e=i 

and  r(-)  denotes  the  transformation  (the  same  test  was  used  for  T{Pi),  i  =  1,2,3,  where 
T{Pi)  is  as  defined  in  (5.4)).  The  multiple  comparison  test  employed  the  Tukey  honestly 
significant  difference  (HSD)  procedure  (see  Sheskin  [126,  p.  534])  that  makes  all  possible 
pairwise  comparisons  and  tests  for  significant  differences  among  the  means,  grouping  them 
accordingly.  Mean  performance  measures  assigned  to  the  same  group  are  not  statistically 
different  from  each  other  under  this  test. 

As  a  safety  precaution  in  the  event  of  violated  model  assumptions,  a  battery  of  non- 
parametric  statistical  procedures  was  also  used  to  test  for  differences  among  the  factor 
populations.  In  particular,  the  following  methods  were  used: 

•  the  Wilcoxon  rank-sum  test  [126,  p.  289]  for  two  factor  levels  {Ws  and  Wjm)  or 
the  Kruskal- Wallis  one-way  ANOVA  [126,  p.  597]  for  three  factor  levels  (TTr), 

•  the  two-sample  median  test  [71,  p.  304]  for  two  factor  levels  {Wg  and  Wjm)  or 
the  Brown-Mood  fc-sample  test  [71,  p.  315]  for  three  factor  levels  (ITr),  and 

•  the  van  der  Waerden  test  [126,  p.  611]  for  any  number  of  factor  levels  {Ws, 
Wn,  and  Wr). 

The  Wilcoxon  rank-sum  procedure  tests  the  hypothesis  that  the  median  of  two  sample 
populations  are  different.  The  Kruskal- Wallis  procedure  can  be  considered  an  extension 
of  the  Wilcoxon  procedure  that  tests  whether  at  least  two  of  k  sample  populations  have 
different  median  values.  The  two-sample  median  procedure  tests  whether  two  populations 
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have  the  same  cumulative  density  function  (c.d.f.)  by  categorizing  each  sample  from  both 
populations  according  to  whether  or  not  it  is  above  or  below  the  composite  median  value 
and  counting  the  instances  of  each  category.  The  Brown-Mood  procedure  extends  this  to  k 
sample  populations  and  tests  whether  at  least  two  of  the  populations  have  a  different  c.d.f. 
The  van  der  Waerden  procedure  also  tests  whether  at  least  two  of  k  sample  populations 
come  from  different  distributions.  This  procedures  organizes  the  data  into  a  set  of  rank- 
orders,  then  transforms  the  rank-orders  into  a  set  of  normal  scores  {z  scores)  from  a  standard 
normal  distribution.  If  the  average  of  the  normal  scores  of  all  populations  are  not  equal  at 
a  prescribed  significance  level,  then  the  null  hypothesis  that  the  populations  derive  from 
the  same  c.d.f.  is  rejected. 

The  results  of  the  significance  tests  on  the  main  effects  Pji,  (3 g,  and  f3j^  are  displayed  in 
Tables  5.6  (continuous- variable  problems)  and  5.7  (mixed- variable  problems).  In  the  figures, 
the  absence  of  any  entry  indicates  that  the  effect  corresponding  to  effect  of  that  column 
tested  as  insignificant  for  that  test  problem.  For  example,  the  effect  (3^  was  insignificant 
for  performance  measure  P  on  test  problem  3  so  the  choice  of  R&S  procedure  had  no  effect 
toward  proximity  to  the  true  optimal  solution  at  termination.  For  effect  /3g,  an  entry  “-I-” 
indicates  that  employing  surrogates  had  a  positive  effect  (indicating  improvement)  toward 
the  performance  measure  where  an  entry  indicates  a  negative  effect.  Similarly,  for  effect 
/d^v,  an  entry  “-I-”  indicates  that  going  from  noise  case  1  to  2  had  a  positive  effect  toward 
the  performance  measure  where  an  entry  indicates  a  negative  effect.  For  effect  /3^,  the 
entry  indicates  the  results  of  the  Tukey  HSD  multiple  comparison  test  where  the  groups 
are  listed  in  descending  order  of  performance  measure  quality  in  terms  of  the  transformed 
data.  For  example,  in  test  problem  3,  using  Rinott’s  procedure  resulted  in  a  better  and 
statistically  different  mean  T{Qi)  than  using  SSM  {T{Q^)),  but  T{Q2)  (SAS)  was  not 
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Table  5.6.  Significance  Tests  of  Main  Effects  for  Performance  Measures  Q  and  P 
—  Continuous-variable  Test  Problems  . 


Test 

Problem 

p 

I^R 

Ps 

Pn 

Pr 

Ps 

Pn 

3 

1- SAS,  RIN 

2- SAS,  SSM 

+ 

+ 

- 

4 

+ 

+ 

+ 

+ 

5 

+ 

+ 

+ 

25 

+ 

- 

+ 

+ 

36 

1- RlN,  SSM 

2- RlN,  SAS 

+ 

+ 

+  (*) 

+ 

105 

+ 

+ 

+ 

+ 

no 

+ 

+ 

118 

+ 

+ 

+ 

+ 

224 

+ 

+ 

244 

+ 

+ 

+ 

+  w 

256 

1- SAS,  SSM 

2- SAS,  RIN 

+ 

+ 

+  (*) 

+ 

275 

+ 

281 

1- SAS,  SSM 

2- SAS,  RIN 

+ 

+ 

1- SSM 

2- SAS,  RIN 

+ 

287 

1- RlN  .  . 

2- SAS,  SSM 

+ 

+ 

1- SAS,  RIN  .  . 

2- SAS,  SSM 

+ 

+ 

288 

1- SSM 

2- RlN 

3- SAS 

- 

+ 

1- SSM 

2- RlN 

3- SAS 

-  w 

+ 

289 

1- SSM,  SAS  .  . 

2- SSM,  RIN 

- 

1- SSM,  SAS  .  . 

2- SSM,  RIN 

- 

297 

1- SSM,  RIN 

2- SAS 

- 

+ 

1- SSM 

2- RlN 

3- SAS 

+ 

300 

1- SSM,  SAS  ,  , 

2- RIN  <*> 

+ 

+ 

+ 

+ 

301 

1- SSM 

2- RlN 

3- SAS 

+  w 

+ 

1- SSM,  RIN 

2- SAS 

+ 

305 

1- SSM 

2- RlN 

3- SAS 

+  w 

+ 

1- SSM 

2- SAS,  RIN 

+  (*) 

+ 

314 

1- RlN,  SAS 

2- RlN,  SSM 

+  (*) 

+ 

(*) 

+ 

392 

- 

+ 

(*) 

- 

+ 
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Table  5.7.  Significance  Tests  of  Main  Effects  for  Performance  Measures  Q  and  P 
—  Mixed-variable  Test  Problems. 


Test 

Problem 

P 

Pr 

Ps 

I^N 

Pr 

Ps 

I^N 

MVPl 

1- SSM 

2- SAS,  RIN 

+ 

1- SSM,  RIN  ,  . 

2- SAS,  RIN 

+ 

MVP2 

1- SSM 

2- SAS,  RIN 

{*) 

+ 

1- SSM 

2- SAS,  RIN 

+ 

MVP3 

1- SSM 

2- SAS,  RIN 

+ 

1- SSM 

2- SAS,  RIN 

+ 

MVP4 

1- SSM 

2- SAS,  RIN 

+  {*) 

+ 

1- SSM 

2- SAS,  RIN 

+ 

statistically  different  from  T{Qi)  or  T(Q3).  Finally,  an  entry  (or  nonentry)  accompanied 
by  a  symbol  “(*)”  indicates  that  at  least  two  of  the  three  nonparametric  procedures  were 


in  disagreement  with  the  ANOVA  and/or  multiple  comparison  results.  For  example,  in  test 
problem  36,  all  three  nonparametric  tests  actually  failed  to  be  significant;  the  Kruskal- Wallis 
test  did  not  detect  a  difference  between  any  of  the  medians  of  the  observation  populations 


Qijki^  Q2jke,  or  Qsjke  (1  <  j  <  2,  1  <  fe  <  2,  1  <  .£  <  30)  nor  did  the  Brown-Mood  or 
van  der  Waerden  test  detect  a  difference  in  the  distributions  from  which  those  data  were 
drawn.  More  detailed  results  of  the  nonparametric  tests  are  included  in  Appendix  B. 

The  results  indicate  a  strong  agreement  between  the  ANOVA/multiple  comparison 
tests  and  the  nonparametric  tests.  Of  the  180  possible  tests  for  significance  of  main  effects, 
only  20  are  contradicted  by  at  least  two  nonparametric  tests,  which  provides  some  validation 
that  the  ANOVA  results  may  be  justified.  Most  of  the  disagreements  occur  with  respect  to 
the  effect  of  the  R&S  procedure  and  the  use  of  surrogates,  typically  refuting  the  possible 
effects  predicted  by  the  ANOVA  procedure. 

Not  surprisingly,  the  results  show  that  better  results  are  almost  always  achieved  in 
noise  case  2  (low  noise)  than  in  noise  case  1  (high  noise),  having  a  positive  effect  on  Q 
and  P  in  24  and  23,  respectively,  of  the  26  problems  using  the  ANOVA  results.  This 
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demonstrates  the  adverse  effects  that  high  response  noise  can  have  on  solution  quality  over 
a  fixed  budget  of  response  samples. 

An  encouraging  finding  is  that,  for  16  of  26  problems  (better  than  60%),  the  use  of 
surrogates  had  a  positive  effect  on  solution  quality  Q  at  termination  using  the  AN OVA 
results,  ft  is  interesting  to  note  that  in  four  of  the  problems,  the  use  of  surrogates  actually 
had  a  negative  effect  on  Q.  This  occurred  for  some  of  the  larger  problems  (one  with  n'^  =  20 
and  three  with  rf  =  30)  and  may  be  due  to  the  fact  that  the  number  of  design  sites  used 
to  build  the  original  surrogate  was  not  increased  as  the  size  of  the  problem  increased.  As  a 
result,  the  surrogate  functions  for  the  larger  problems  may  have  suffered  from  larger  relative 
inaccuracies  that  caused  the  algorithm  to  search  unpromising  regions  of  the  design  space 
in  vain.  The  effect  of  surrogate  use  on  P  is  similar  to  that  of  Q  but  not  as  pronounced;  in 
ten  instances  the  use  of  surrogates  had  a  positive  effect  (each  corresponding  to  one  of  the 
16  positive  effect  instances  for  Q)  and  in  four  instances  had  a  negative  effect  (three  of  them 
corresponding  to  the  four  instances  for  Q).  For  problem  3,  the  use  of  surrogates  actually 
positively  affected  Q  but  negatively  affected  P. 

Similar  observations  can  be  made  regarding  the  use  of  the  various  R&S  procedures.  For 
performance  measure  Q,  sixteen  of  the  26  problems  showed  significant  effects  for  parameter 

The  multiple  comparison  tests  revealed  that  procedure  SSM  was  in  the  lead  group 
(delivering  the  smallest  mean  T{Qi))  on  thirteen  occasions,  where  SAS  and  RIN  were  in 
the  lead  group  on  six  and  five  occasions,  respectively.  In  the  case  of  measure  P,  effect  (3^ 
was  significant  eleven  times  and  SSM,  SAS,  and  RIN  were  in  the  lead  group  on  ten,  two, 
and  three  occasions,  respectively.  These  results  seem  to  indicate  an  advantage  of  using  the 
SSM  procedure  over  the  range  of  test  problems,  but  it  is  interesting  to  note  that  RIN  is  not 
always  dominated  by  either  SSM  or  SAS  when  is  significant.  Perhaps  the  conditions 
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under  which  the  more  modern  procedures  SSM  and  SAS  perform  well  (heterogeneity  of  true 
objective  function  values  among  the  candidate  set)  do  not  occur  as  frequently  as  expected, 
at  least  for  the  test  problems  and  noise  structure  considered  in  this  research.  On  the  other 
hand,  it  should  be  mentioned  that  the  choice  of  R&S  procedure  shows  an  effect  with  greater 
frequency  for  the  large  problems  than  for  the  small  and  medium  problems.  Furthermore, 
Rinott’s  procedure  is  in  the  lead  group  for  the  large  problems  on  only  one  occasion  for 
Q  (shared  with  SSM)  and  never  for  P.  This  suggests  that  choice  of  the  R&S  procedure 
becomes  more  critical  as  problem  dimension  grows. 

Tables  5.6  and  5.7  do  not  make  reference  to  the  interaction  terms 
in  model  (5.3).  For  the  most  part,  these  terms  failed  to  be  significant.  Term  /3gj^  was 
significant  in  thirteen  problems  for  measure  Q  and  nine  problems  for  measure  P.  Each  of 
the  terms  Pgg  and  was  significant  in  either  five  or  six  problems  for  both  measures. 
Furthermore,  no  systematic  trend  exists  for  the  cases  that  tested  significant.  For  example, 
the  term  Pgj^  had  a  positive  effect  on  Q  in  seven  cases  and  a  negative  effect  in  six  cases. 

The  terminal  values  for  Q  and  P,  averaged  over  60  replications  (30  for  each  noise  case) 
for  each  of  the  six  MGPS-RS  variants  are  presented  in  Tables  5.8  and  5.9,  respectively.  In 
each  table,  the  best  result  of  the  six  algorithms  is  enclosed  by  a  rectangle.  However,  it  should 
be  noted,  as  indicated  in  the  preceding  statistical  analysis,  that  the  best  is  not  necessarily 
statistically  significant.  From  Table  5.8,  it  can  be  seen  that  for  18  of  26  problems,  the 
average  objective  function  value  of  the  best  result  at  termination  has  progressed  to  within 
10%  of  the  difference  between  the  starting  and  optimal  solutions.  Table  5.9  illustrates 
that  most  average  terminal  values  of  P  have  improved  from  their  original  value  of  1.0, 
indicating  that  the  terminal  design  has  moved  closer  to  the  optimal  solution  -  an  obvious 
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Table  5.8.  Terminal  Value  for  Performance  Measure  Q  Averaged  over  60  Repli¬ 
cations  (30  for  each  noise  case)  —  MGPS-RS  Algorithms. 


Test 

Problem 

S-MGPS- 

RIN 

S-MGPS- 

SSM 

S-MGPS- 

SAS 

MGPS- 

RIN 

MGPS- 

SSM 

MGPS- 

SAS 

3 

0.01308 

0.02231 

0.02333 

0.15040 

0.23041 

0.26814 

4 

0.11604 

0.06449 

0.09656 

0.42984 

0.38489 

0.37994 

5 

0.06853 

0.06905 

0.12139 

0.13667 

0.08224 

0.13039 

25 

0.06845 

0.07656 

0.08417 

0.10561 

0.10226 

0.10697 

36 

9.184e-5 

6.877e-5 

]  9.888e-5 

14.64e-5 

12.78e-5 

13.92e-5 

105 

0.12228 

0.10386 

0.12043 

0.40080 

0.39563 

0.40299 

no 

0.60609 

0.57289 

0.65849 

0.57437 

0.60648 

0.61715 

118 

0.07445 

0.07561 

0.06792 

0.09377 

0.06721 

0.08202 

224 

0.00163 

0.00178 

0.00182 

0.00276 

0.00211 

0.00178 

244 

0.38432 

0.37756 

0.60000 

0.37394 

0.43310 

0.36920 

256 

0.00096 

0.00050 

0.00069 

0.00129 

0.00127 

0.00128 

275 

0.00532 

0.00552 

0.00563 

0.00550 

0.00564 

0.00745 

281 

0.21216 

0.14964 

0.20407 

0.28558 

0.22912 

0.25370 

287 

5.662e-4 

6.751e-4 

7.043e-4 

4.223e-4  [ 

4.166e-4 

]  4.271e-4 

288 

0.00233 

0.00398 

0.00396 

0.00126 

0.00036 

0.00142 

289 

1.20395 

1.19910 

1.17340 

1.00405 

1.00118 

1.00119 

297 

3.510e-4 

8.168e-4 

18.42e-4 

1.463e-4  [ 

0.570e-4 

]  9.565e-4 

300 

0.94788 

0.92655 

0.91767 

0.98916 

0.97512 

0.97711 

301 

0.96703 

0.93819 

1.00422 

0.98656 

0.94338 

1.01613 

305 

5.032e-9 

4.910e-9 

1  5.126e-9 

5.071e-9 

4.939e-9 

5.202e-9 

314 

0.03267 

0.03073 

0.02279 

0.03343 

0.03318 

0.03038 

392 

0.59389 

0.58836 

0.59115 

0.50274 

0.48743 

0.48711 

MVPl 

0.00338 

0.00228 

0.00344 

0.00271 

0.00140 

0.00238 

MVP2 

0.00533 

0.00309 

0.00316 

0.00316 

0.00237 

0.00431 

MVP3 

0.01481 

0.00866 

0.01270 

0.01315 

0.00938 

0.01224 

MVP4 

0.00713 

0.00497 

0.00640 

0.00613 

0.00612 

0.00681 

sign  of  convergence.  In  24  of  26  cases  for  Q,  and  22  of  26  cases  for  P,  the  best  solution  is 


produced  by  an  algorithm  that  uses  surrogates,  the  SSM  procedure,  or  both. 


Tables  5.8  and  5.9  also  reflect  the  poor  performance  of  the  algorithms  for  some  of 


the  more  difficult  problems.  In  particular,  problems  300  and  301  are  both  instances  of  a 


gradually  sloping  quadratic  with  n  —  1  cross  terms  so  that  the  function  contours  are  not 


130 


Table  5.9.  Terminal  Value  for  Performance  Measure  P  Averaged  over  60  Repli¬ 
cations  (30  for  each  noise  case)  —  MGPS-RS  Algorithms. 


Test 

Problem 

S-MGPS- 

RIN 

S-MGPS- 

SSM 

S-MGPS- 

SAS 

MGPS- 

RIN 

MGPS- 

SSM 

MGPS- 

SAS 

3 

1.39540 

1.53845 

1.45450 

0.97642 

0.98390 

0.98830 

4 

0.43096 

0.23574 

0.35860 

0.97671 

0.85905 

0.72920 

5 

0.23016 

0.27609 

0.38620 

0.31260 

0.22689 

0.29629 

25 

0.60901 

0.69032 

0.62041 

0.88297 

0.87873 

0.90482 

36 

2.771e-4  [ 

2.353e-4 

]  2.941e-4 

3.211e-4 

2.804e-4 

3.137e-4 

105 

0.62117 

0.61531 

0.66029 

0.93319 

0.93494 

0.93786 

no 

0.72689 

0.68316 

0.76373 

0.70208 

0.72930 

0.74529 

118 

0.54801 

0.57414 

0.53927 

0.58954 

0.56467 

0.60107 

224 

0.05014 

0.06133 

0.05241 

0.06558 

0.06494 

0.05306 

244 

0.45466 

0.39325 

0.48229 

0.67024 

0.69413 

0.63578 

256 

0.10871 

0.09211 

0.09291 

0.12091 

0.11032 

0.10830 

275 

0.43201 

0.49222 

0.47172 

0.46632 

0.44355 

0.42695 

281 

0.50573 

0.37075 

0.48541 

0.39984 

0.35480 

0.41369 

287 

0.36035 

0.38910 

0.38330 

0.44449 

0.44574 

0.44300 

288 

0.12594 

0.13047 

0.16748 

0.11530 

0.07931 

0.11538 

289 

1.24475 

1.23490 

1.20105 

1.00400 

1.00132 

1.00130 

297 

0.02478 

0.02168 

0.03921 

0.03703 

0.02244 

0.04170 

300 

0.96132 

0.96618 

0.96129 

0.99533 

0.99445 

0.99533 

301 

0.99200 

0.99152 

0.99476 

0.99179 

0.99098 

0.99400 

305 

4.51285 

4.46560 

4.52590 

4.52985 

4.47735 

4.54240 

314 

0.38761 

0.41910 

0.33489 

0.41407 

0.43805 

0.39969 

392 

0.96754 

0.95826 

0.96644 

0.87721 

0.86539 

0.86532 

MVPl 

0.10853 

0.09774 

0.09847 

0.09504 

0.07842 

0.09871 

MVP2 

0.13581 

0.10122 

0.09606 

0.09820 

0.11828 

0.11486 

MVP3 

0.43704 

0.31779 

0.38668 

0.34565 

0.25619 

0.32271 

MVP4 

0.46690 

0.30877 

0.39041 

0.28361 

0.36255 

0.37638 

along  the  coordinate  axes.  This  is  a  hindrance  in  these  experiments  because  the  search 


directions  are  set  to  the  axes. 


Problem  289  is  also  a  challenging  problem  with  a  general  nonlinear  objective  function 
for  which  the  starting  value  and  optimal  value  differ  only  by  0.6963.  Hence,  the  noise  has 
a  greater  influence  for  this  problem,  even  in  the  low  noise  case,  because  the  noise  observed 
at  different  candidate  designs  can  dominate  the  magnitude  of  the  true  objective  function 
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value  difference  between  those  designs.  It  is  also  of  larger  dimension  (n  =  30)  than  most  of 
the  problems.  On  a  smaller  scale,  problem  244  also  presented  difficulties  for  the  algorithms. 
This  three-dimensional  problem  is  similar  to  problem  289  in  that  the  difference  between 
starting  and  optimal  objective  function  values  is  small  (1.5988);  as  a  result,  the  best  of  the 
MGPS-RS  algorithms  could  only  achieve  about  a  63%  reduction  in  performance  measure  Q. 


From  the  results  of  this  section,  it  appears  that  enough  evidence  exists  to  claim  that 


procedure  SSM  offers  performance  advantages  over  the  other  R&S  procedures.  However, 

Table  5.10.  Number  of  Switches  SW  at  Termination  Averaged  over  60  Replica¬ 
tions  (30  for  each  noise  case)  —  MGPS-RS  Algorithms. 


Test 

Problem 

S-MGPS- 

SSM 

S-MGPS- 

SAS 

MGPS- 

SSM 

MGPS- 

SAS 

3 

80,963 

155 

90,254 

222 

4 

77,202 

134 

89,675 

223 

5 

80,110 

139 

90,498 

221 

25 

80,150 

219 

89,132 

292 

36 

83,714 

287 

89,718 

288 

105 

83,697 

394 

88,439 

448 

no 

74,632 

365 

85,434 

666 

118 

77,749 

459 

83,910 

686 

224 

90,375 

222 

90,329 

223 

244 

84,123 

226 

89,533 

292 

256 

79,485 

200 

90,017 

355 

275 

89,908 

354 

87,496 

355 

281 

78,739 

421 

84,577 

672 

287 

77,695 

746 

76,441 

951 

288 

60,006 

419 

79,742 

1,012 

289 

377,400 

1,713 

405,585 

2,144 

297 

380,265 

1,265 

408,045 

2,061 

300 

77,073 

973 

79,512 

1,056 

301 

369,585 

2,742 

391,645 

3,354 

305 

342,265 

2,596 

370,750 

3,811 

314 

77,775 

141 

89,842 

224 

392 

362,100 

150 

250,635 

69 

MVPl 

87,315 

320 

87,002 

336 

MVP2 

37,239 

142 

62,727 

237 

MVP3 

353,845 

4,945 

416,210 

5,126 

MVP4 

234,737 

3,331 

284,115 

4,156 

132 


as  discussed  in  Section  4.1,  this  discussion  is  not  complete  without  evaluating  the  number 
of  switches  SW  required  by  the  algorithms.  Table  5.10  presents  the  number  of  cumula¬ 
tive  switches  required,  averaged  over  60  replications  (30  for  each  noise  case)  for  the  algo¬ 
rithm  variants  using  the  SSM  and  SAS  procedures  (recall  that  Rinott’s  procedure  incurs  no 
switching).  The  table  shows  that  switching  for  the  fully  sequential  SSM  procedure  can  be 
quite  significant,  requiring  more  switches  than  SAS  by  approximately  two  orders  of  mag¬ 
nitude  on  each  of  the  test  problems.  If  used  to  optimize  a  real-world  system  by  evaluating 
a  simulation  model,  this  cost  must  be  taken  into  account  before  deciding  which  algorithm 
variant  to  use.  If  the  switching  cost  is  negligible  relative  to  the  cost  of  simulation  execution, 
it  should  not  have  much  impact.  However,  as  Hong  and  Nelson  [59]  suggest,  switching  cost 
can  sometimes  exceed  sampling  costs  by  orders  of  magnitude,  which  could  make  the  use  of 
SSM  within  MGPS-RS  computationally  prohibitive. 

5.5.2  Comparative  Analysis  of  All  Algorithm  Implementations 

In  this  subsection,  the  analysis  of  the  results  is  extended  to  the  comparison  of  MGPS- 
RS  with  the  competing  algorithm  implementations  presented  in  Section  5.2.  The  terminal 
values  for  Q  and  P,  averaged  over  60  replications  (30  for  each  noise  case)  for  each  of  the 
four  competing  algorithms  are  presented  Tables  5.11  and  5.12,  which  also  includes  the  best 
result  of  the  MGPS-RS  variants.  The  appropriate  MGPS-RS  algorithm  is  listed  where, 
for  convenience,  the  algorithm  name  is  shortened  to,  for  example,  S-RIN  for  “surrogate 
assisted  MGPS  with  Rinott’s  procedure”  or  RIN  for  “MGPS  with  Rinott’s  procedure  and  no 
surrogates”.  In  each  table,  the  result  that  delivered  the  best  average  performance  measure 
is  enclosed  by  a  rectangle. 

The  results  indicate  that,  for  the  continuous- variable  problems,  the  best  results  are 
distributed  primarily  among  the  two  SA  procedures.  Of  the  22  continuous-variable  prob- 
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Table  5.11.  Terminal  Value  for  Performance  Measure  Q  Averaged  over  60  Repli¬ 
cations  (30  for  each  noise  case)  —  FDSA,  SPSA,  RNDS,  NM,  and  Best 
MGPS-RS  Algorithms. 


Test 

Problem 


3 

4 

5 

25 

36 

105 

no 

118 

224 

244 

256 

275 

281 

287 

288 
289 
297 

300 

301 
305 
314 
392 

MVPl 

MVP2 

MVP3 

MVP4 


Best  of 
MGPS-RS 


S-RIN 

S-SSM 

S-RlN 

S-RIN 

S-SSM 

S-SSM 

S-SSM 

SSM 

S-RIN 

SAS 

S-SSM 

S-RIN 

S-SSM 

SSM 

SSM 

SSM 

SSM 

S-SAS 

S-SSM 

S-SSM 

S-SAS 

SSM 

SSM 

SSM 

S-SSM 

S-SSM 


0.01308 

0.06449 

0.06853 

0.06845 


0.069e-3 


0.10386 


0.57289 

0.06721 

1.628e-3 

0.36920 

4.953e-4 

5.322e-3 

0.14964 


4.166e-4 


3.639e-4 

1.00118 


0.057e-3 


0.91767 

0.93819 

49.10e-10 

0.02279 

0.48743 


0.00140 


0.00237 


0.00866 


0.00497 


FDSA 


0.00187 

0.00018 

0.00071 

0.09150 

1.260e-3 

0.77604 

0.04767 


0.00245 


Note  1 


0.030e-3 


0.00987 

0.042e-4 

0.049e-3 

0.06015 

18.18e-4 


9.443e-4 


0.99157 

4.024e-3 

0.80106 

0.95923 

2.868e-10 

0.00026 


0.00768 


Note  3 


SPSA 


0.00563 

0.00451 

0.00098 

0.50503 

0.421e-4 

0.69540 


16.09e-4 

0.113e-3 

0.09278 

4.784e-4 


0.00817  ^ 


RNDS  NM 


0.04866 

- 

0.15692 

- 

0.03466 

- 

0.03091 

7.028e-3 

- 

0.61353 

- 

0.12655 

- 

0.32007 

- 

l.lOle-3 

- 

0.05213 

0.50936 

3.347e-4 

15.89e-4 

3.286e-3 

3.307e-3 

0.21383 

0.18178 

12.38e-4 

5.475e-4 

239.7e-4 

211.3e-4 

1.08630 

0.99534 

6.357e-3 

10.13e-3 

0.91633 

0.39490 

1.08404 

0.76375 

3988.e-10 

16.61e-10 

0.00958 

0.01749 

0.85567 

- 

0.00614 

- 

0.01333 

- 

0.02184 

- 

0.01110 

— 

Note  1:  45  of  60  terminal  solutions  were  infeasible  with  average  maximum  constraint  violation 
(MCV)  of  .00226,  maximum  MCV  of  0.02020. 

Note  2:  3  of  60  were  infeasible  with  average  MCV  of  .00024,  maximum  MCV  of  0.00032. 

Note  3:  All  60  were  infeasible  with  average  MCV  of  .02717,  maximum  MCV  of  0.044054. 

Note  4:  2  of  60  were  infeasible  with  average  MCV  of  .00315,  maximum  MCV  of  0.00600. 


lems,  one  of  these  methods  claimed  the  best  average  performance  for  each  measure  Q  and 
P  in  17  cases.  The  FDSA  implementation  tended  to  perform  better  in  low  dimensions 
(problems  3,  4,  5,  256,  and  275)  where  SPSA  performed  better  in  larger  dimensions  (prob- 
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lems  288,  289,  300,  and  301).  This  is  a  tribute  to  SPSA’s  efficient  technique  for  estimating 
the  gradient,  for  which  response  samples  are  required  at  only  two  design  points  regardless 
of  problem  dimension.  The  results  obtained  by  SPSA  on  problems  289,  300,  and  301  are 
fairly  remarkable  considering  the  poor  performance  of  the  remaining  methods,  although 
the  Nelder-Mead  method  enjoyed  limited  success  for  quadratic  problems  300  and  301. 

Although  the  SA  algorithm  implementations  appear  to  have  superior  performance  for 
this  group  of  continuous-variable  test  problems,  MGPS-RS  was  able  to  obtain  the  best  av¬ 
erage  performance  on  four  occasions  for  each  performance  measure.  This  is  partly  due  to 
the  fact  that  MGPS-RS  searches  entirely  within  the  feasible  region  for  constrained  prob¬ 
lems.  This  was  a  benefit  for  problems  25  and  105  because  the  objective  function  for  these 
problems  can  evaluate  to  a  complex  number  for  certain  points  outside  the  feasible  region. 
In  the  MGPS-RS  case,  infeasible  points  are  easily  handled  by  assigning  an  arbitrarily  large 
objective  function  value  without  sampling.  For  the  SA  algorithms,  however,  the  rules  gov¬ 
erning  the  perturbation  parameter  Ck  require  this  parameter  to  be  set  relatively  large  in 
early  iterations  (recommended  equal  to  one  standard  deviation  of  response  sample  noise) 
so  that  it  can  gradually  decay  to  zero.  If  there  are  restrictive  bounds  on  some  variables  (as 
in  problem  105)  and  if  infeasible  points  cannot  be  evaluated,  this  leads  to  a  smaller  initial 
setting  for  Ck-  This  has  a  negative  impact  an  gradient  accuracy  which,  to  retain  stability, 
necessitates  a  smaller  setting  for  the  initial  step  size,  and  therefore  slows  the  convergence 
since  the  step  size  also  decays  with  k. 

Another  disadvantage  of  the  SA  algorithms  for  constrained  problems,  as  implemented 
in  this  research,  is  that  the  simple  method  for  correcting  infeasible  designs  suggested  in 
Section  5.2  was  unable  to  avoid  infeasibilities  for  two  of  the  linearly  constrained  problems 
(118  and  392).  For  each  replication,  the  maximum  constraint  violation  (MGV)  at  termi- 
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Table  5.12.  Terminal  Value  for  Performance  Measure  P  Averaged  over  60  Repli¬ 
cations  (30  for  each  noise  case)  —  FDSA,  SPSA,  RNDS,  NM,  and  Best 
MGPS-RS  Algorithms. 


Best  of 
MGPS-RS 


Test 

Problem 


3 

4 

5 

25 

36 

105 

no 

118 

224 

244 

256 

275 

281 

287 

288 
289 
297 

300 

301 
305 
314 
392 

MVPl 

MVP2 

MVP3 

MVP4 


RIN 

S-SSM 

SSM 

S-RIN 

S-SSM 

S-SSM 

S-SSM 

S-SAS 

S-RlN 

S-SSM 

S-SSM 

SAS 

SSM 

S-RIN 

SSM 

SSM 

S-SSM 

S-SAS 

SSM 

S-SSM 

S-SAS 

SSM 

SSM 

S-SAS 

SSM 

RIN 


0.97642 

0.23574 

0.22689 

0.60901 

0.00024 

0.61531 

0.68316 

0.53927 

0.05014 

0.39325 

0.09211 

0.42695 

0.35480 

0.36035 

0.07931 

1.00132 

0.02168 

0.96129 

0.99098 

4.46560 

0.33489 

0.86539 

0.07842 

0.09606 

0.25619 

0.28361 


FDSA  SPSA  RNDS  NM 


0.97415 

0.94414 

0.00067 

0.01677 

0.02264 

0.03144 

0.99261 

0.90349 

0.00384 

0.00119 

0.99959 

0.99666 

0.18265 

0.12051 

0.26615  1 

0.21932 

0.00438 

0.00258 

0.09379 

0.08616 

0.00268 

0.17929 

0.07361 

0.19038 

0.03045 

0.25663 

0.28593 

0.33258 

0.15083 

0.00632 

0.99194 

0.24715 

0.63818 

0.08092 

0.94894 

0.18877 

0.99689 

0.81008 

0.47072 

1.26915 

0.03300 

0.02249 

0.26997  3 

0.15424 

1.06560 

- 

0.39224 

- 

0.15162 

- 

0.86439 

- 

0.01972 

- 

0.98841 

- 

0.28626 

- 

0.73118 

- 

0.05936 

- 

0.29376 

0.61167 

0.08571 

0.15597 

1.25040 

0.42242 

0.90201 

0.63811 

0.43073 

0.39466 

0.57399 

0.59733 

1.09515 

0.99590 

0.62075 

0.52845 

0.96570 

0.39665 

0.99746 

0.73491 

39.0515 

2.59755 

0.22037 

0.30277 

0.98951 

- 

0.17562 

- 

0.21823 

- 

1.16450 

- 

1.05580 

— 

Note  1:  45  of  60  terminal  solutions  were  infeasible  with  average  maximum  constraint  violation 
(MCV)  of  .00226,  maximum  MCV  of  0.02020. 

Note  2:  3  of  60  were  infeasible  with  average  MCV  of  .00024,  maximum  MCV  of  0.00032. 

Note  3:  All  60  were  infeasible  with  average  MCV  of  .02717,  maximum  MCV  of  0.044054. 

Note  4:  2  of  60  were  infeasible  with  average  MCV  of  .00315,  maximum  MCV  of  0.00600. 


nation  was  recorded,  and  the  average  and  maximum  MCV  (over  the  60  replications)  are 
annotated  in  Tables  5.11  and  5.12.  An  advantage  of  MGPS-RS  algorithms  in  the  presence 
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of  linear  constraints  is  that  the  direction  set  can  be  easily  updated  to  incorporate  conform¬ 
ing  directions,  so  long  as  the  constraints  are  not  degenerate. 

Even  though  MGPS-RS  has  some  built-in  advantages  for  some  of  the  constrained 
problems,  it  is  also  able  to  generate  competitive  results  for  some  of  the  larger,  unconstrained 
problems  (287,  297,  and  305).  The  RNDS  and  NM  methods  developed  for  this  research 
cannot  make  this  claim  in  general.  Each  of  these  methods  is  outperformed  by  one  of  the 
other  methods  in  every  case  but  one.  In  problem  25,  RNDS  generates  the  best  average 
result  for  Q  and  in  problem  301,  NM  generates  the  best  average  result  for  P.  Algorithm 
NM  also  is  at  a  disadvantage  because  it  cannot  be  applied  unmodified  to  the  constrained 
problems. 

For  each  of  the  mixed-variable  problems,  MGPS-RS  outperformed  RNDS  on  both 
performance  measures.  This  is  not  surprising  since  the  same  conclusion  also  generally  held 
for  the  continuous- variable  problems. 

5.5.3  Termination  Criteria  Analysis 

To  complete  the  analysis  of  the  results,  the  termination  criteria  proposed  in  Section 

4.3  are  evaluated  in  this  subsection.  To  facilitate  the  analysis,  various  output  data  were 
generated  in  addition  to  the  performance  measures  described  in  Section  5.4.  At  various 
stages  of  algorithm  execution,  the  following  data  were  saved  to  the  output  file: 

•  standard  deviation  of  the  incumbent  design  response  5inc  after  the  initial  stage 
of  So  samples, 

•  indifference  zone  parameter  value  d, 

•  significance  level  parameter  value  a, 

•  step  length  parameter  value  A, 

•  number  of  iterations  completed,  and 

•  number  of  response  samples  RS  obtained. 
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Using  (4.7)  as  a  guide  to  predict  when  the  per-iteration  sampling  requirements  might 
grow  rapidly,  the  analysis  was  conducted  by  finding  the  first  point  during  algorithm  pro¬ 
gression  at  which  the  ratio  exceeded  unity  (using  K  =  1  m.  (4.7)).  The  iteration  at 
which  this  occurred,  denoted  as  /c',  was  recorded,  as  were  the  values  the  percent 

reduction  in  Q,  and  the  number  of  response  samples  RS  accumulated.  Also  computed  for 
the  analysis  was  the  percent  reduction  in  Q  from  iteration  k'  until  termination  at  iteration 
kt,  as  well  as  the  number  of  additional  iterations  completed  and  response  samples  obtained 
before  termination. 

Using  algorithm  S-MGPS-RIN  as  a  case  study  for  the  analysis,  the  average  of  these 
quantities  over  30  replications  are  displayed  in  Table  5.13  for  noise  case  1  and  Table  5.14 
for  noise  case  2.  In  the  tables,  the  averages  for  percent  reduction  in  Q  {%Q),  number  of 
iterations  (Iter.),  and  response  samples  (RS)  are  shown  twice:  first  for  the  period  from 
initialization  to  iteration  k'  {k  <  k'),  then  for  the  period  from  k'  to  termination  {k'  <  k  < 
kt).  Also  listed  in  the  tables  is  a  notional  setting  for  the  step  length  termination  scalar  At 
(4.9),  set  to  the  fraction  of  the  initial  step  length  Aq. 

In  each  table,  the  test  problems  marked  with  an  asterisk  indicate  those  that  would  have 
satisfied  the  termination  criteria  at  iteration  k'  if  the  threshold  setting  for  the  significance 
level  (4.8)  were  set  to  0.01.  For  noise  case  1,  this  occurred  five  times  and  for  noise  case 
2,  eight  times.  In  ten  of  these  13  cases,  excellent  progress  was  made  toward  the  optimal 
objective  function  value  (97%  or  higher  reduction  in  Q).  Even  more  telling  is  that  in  all 
cases,  very  little  progress  was  made  from  k'  until  termination  while  using  significantly  more 
samples  over  fewer  iterations.  This  is,  in  fact,  true  for  nearly  all  of  the  problems  which  is 
an  indicator  that  may  be  a  useful  means  for  selecting  a  stopping  point. 
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Table  5.13.  Termination  Criteria  Analysis  for  S-MGPS-RIN  —  Noise  Case  1. 


Test 

Problem 

A't’ 

^k' 

Ok' 

k<k' 

k' 

<  k  <  kt 

%Q 

Iter. 

RS 

%Q 

Iter. 

RS 

3 

.0050 

.0060 

.0071 

99.24 

64.3 

2,570 

0.13 

22.3 

97,430 

*  4 

.0025 

l.lOe-4 

.0065 

78.82 

69.2 

2,889 

0.72 

23.0 

97,111 

*  5 

.0050 

.0012 

.0075 

87.75 

62.1 

3,464 

1.29 

18.9 

96,536 

25 

.0200 

.0474 

.0112 

94.37 

55.0 

3,887 

0.43 

19.5 

96,113 

*  36 

.0100 

2.04e-5 

.0092 

99.99 

75.3 

5,111 

0.00 

26.1 

94,889 

105 

.0025 

.0865 

.0415 

82.73 

40.9 

5.860 

0.89 

20.5 

94,140 

no 

.0010 

.0389 

.0152 

5.47 

39.8 

11,556 

8.76 

10.0 

88,444 

118 

.0400 

.7401 

.0385 

86.84 

44.1 

9,110 

1.78 

16.8 

90,890 

*  224 

.0050 

4.28e-5 

.0089 

99.75 

87.1 

4,481 

0.00 

27.3 

95,519 

244 

.0200 

.0223 

.0100 

45.95 

62.9 

4,794 

0.58 

22.4 

95,206 

256 

.0100 

.0200 

.0100 

99.84 

43.8 

6,380 

0.03 

12.2 

93,620 

*  275 

.0100 

.0042 

.0092 

99.25 

84.8 

7,912 

0.01 

23.1 

92,088 

281 

.0050 

.1969 

.0235 

52.87 

47.8 

11,065 

10.46 

13.7 

88,935 

287 

.0100 

.3490 

.0749 

99.78 

35.5 

18,386 

0.14 

15.2 

81,614 

288 

.0100 

.4708 

.0324 

98.99 

31.6 

29,702 

0.55 

7.0 

70,298 

289 

.0010 

.0642 

.0128 

-20.46 

67.9 

57,823 

0.25 

16.1 

442,177 

297 

.0200 

.2667 

.0852 

97.41 

22.5 

18,076 

2.53 

36.1 

481,924 

300 

.0010 

.0652 

.0391 

-6.18 

43.4 

20,558 

1.00 

10.9 

79,442 

301 

.0200 

.7875 

.0698 

-12.19 

23.9 

35,159 

8.80 

10.4 

464,841 

305 

.0200 

.1760 

.0451 

100.0 

28.2 

295,210 

0.00 

1.1 

204,790 

314 

.0025 

.0027 

.0080 

94.82 

61.8 

2,836 

0.45 

20.6 

97,164 

392 

.1000 

2.8757 

.0692 

37.76 

28.1 

3,444 

3.60 

26.0 

496,556 

MVPl 

.0050 

.0776 

.0119 

99.45 

39.2 

17,444 

0.12 

13.3 

82,557 

MVP2 

.0050 

.2408 

.0593 

99.22 

24.6 

18,264 

0.02 

1.6 

81,736 

MVP3 

.0050 

.0860 

.0284 

97.36 

201.3 

259,358 

0.03 

26.3 

240,642 

MVP4 

.0050 

.2776 

.0924 

98.68 

98.9 

129,302 

0.02 

6.5 

370,698 

It  should  be  noted,  however,  that  for  some  problems,  particularly  in  noise  case  1,  some 
mildly  significant  improvement  was  still  possible  after  iteration  k'  {e.g.,  problems  110,  281, 


and  301).  In  each  of  these  cases,  the  average  step  length  had  not  been  reduced  dramatically 


from  its  initial  setting,  indicating  that  the  algorithm  may  still  have  been  making  many 


successful  moves  through  the  design  space.  In  these  situations,  it  would  be  advantageous 
to  have  a  parameter  update  strategy  that  monitored  the  decay  rate  of  and  adjusted  the 


decay  rate  of  and  5^  accordingly.  That  is,  if  is  decaying  slowly  then  so  should 
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Table  5.14.  Termination  Criteria  Analysis  for  S-MGPS-RIN  —  Noise  Case  2. 


Test 

Problem 

At’ 

Ok' 

k<k' 

k!  <  k  <  kt 

%Q 

Iter. 

RS 

%Q 

Iter. 

RS 

3 

Hj 

.0060 

.0071 

99.24 

64.3 

2,570 

0.13 

22.3 

97,430 

*  4 

2.51e-5 

.0068 

97.09 

70.1 

2,563 

0.16 

26.0 

97,437 

5 

.0084 

.0067 

96.68 

62.3 

2,877 

0.57 

19.4 

97,123 

*  25 

.0086 

.0048 

91.44 

71.0 

5,054 

0.07 

18.4 

94,946 

*  36 

1.35e-5 

.0074 

99.99 

78.8 

5,722 

0.00 

24.6 

94,278 

105 

.1360 

.0025 

91.48 

74.6 

20,308 

0.45 

9.2 

79,692 

no 

.0010 

.0331 

.0053 

60.48 

49.6 

19,046 

4.07 

6.6 

80,954 

118 

.0400 

.2810 

.0031 

95.70 

77.8 

25,140 

0.78 

8.1 

74,860 

*  224 

.0050 

5.69e-5 

.0073 

99.92 

90.6 

4,164 

0.00 

28.9 

95,836 

244 

.0200 

.0243 

.0074 

75.25 

58.4 

4,002 

1.36 

19.1 

95,998 

256 

.0100 

.0111 

.0077 

99.93 

45.6 

5,232 

0.02 

12.8 

94,768 

*  275 

.0100 

.0027 

.0078 

99.67 

88.2 

8,345 

0.00 

22.5 

91,655 

281 

.0050 

.0223 

.0060 

93.63 

50.3 

17,368 

0.61 

7.1 

82,632 

287 

.0100 

.0193 

.0020 

99.97 

99.0 

79,485 

0.00 

1.4 

20,515 

288 

.0100 

.1959 

.0074 

99.99 

45.8 

25,781 

0.00 

4.6 

74,219 

289 

.0010 

.0519 

.0067 

-20.86 

77.2 

72,421 

0.27 

13.3 

427,579 

*  297 

.0200 

.0164 

.0066 

99.99 

68.8 

76,727 

0.01 

12.3 

423,273 

300 

.0010 

.0374 

.0023 

15.49 

99.8 

79,607 

0.11 

1.7 

20,393 

301 

.0200 

.0891 

.0014 

9.84 

84.5 

319,703 

0.15 

4.2 

180,297 

*  305 

.0200 

.0164 

.0023 

100.0 

73.9 

432,329 

0.00 

0.2 

67,671 

*  314 

.0025 

.0024 

.0066 

97.99 

63.0 

3,138 

0.21 

20.1 

96,862 

392 

.1000 

.8835 

.0009 

39.45 

101.9 

52,511 

0.42 

21.1 

447,489 

MVPl 

.0050 

.0321 

.0076 

99.72 

47.5 

9,216 

0.03 

20.7 

90,784 

MVP2 

.0050 

.2105 

.0174 

99.67 

19.8 

16,514 

0.02 

4.7 

83,486 

MVP3 

.0050 

.0080 

.0042 

99.64 

302.3 

286,778 

0.01 

29.1 

213,222 

MVP4 

.0050 

.0532 

.0076 

99.85 

183.7 

298,393 

0.02 

21.0 

201,607 

and  Sk  to  allow  the  search  to  continue  exploring  the  design  space  aggressively  before  the 


sampling  requirements  increase  to  prohibitive  levels. 


5.5.4  Summary  of  the  Analysis 

The  analytical  results  of  this  section  provide  enough  evidence  to  draw  some  conclu¬ 
sions  regarding  the  performance  of  the  algorithms  on  the  test  problems  considered  in  these 
experiments.  First,  the  use  of  surrogates  and/or  the  SSM  R&S  procedure  appears  to  have  a 
positive  effect  on  algorithm  performance  in  many  cases,  although  the  importance  of  trading 
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off  switching  costs  with  algorithm  progress  was  illustrated.  Secondly,  the  stochastic  approx¬ 
imation  algorithms  generally  perform  better  for  problems  with  continuous  variables  only, 
although  constrained  problems  can  present  these  methods  with  some  difficulties.  Finally, 
the  proposed  MGPS-RS  termination  criteria  seem  to  offer  a  valid  mechanism  to  establish 
some  algorithm  stopping  decision  rules,  particularly  if  the  algorithms  are  modified  to  allow 
adaptive  update  strategies  for  the  R&S  parameters. 
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Chapter  6  -  Conclusions  and  Recommendations 

A  new  class  of  algorithms  for  solving  mixed- variable  stochastic  optimization  problems 
has  been  presented.  It  further  generalizes  the  class  of  generalized  pattern  search  algorithms 
to  noisy  objective  functions,  using  ranking  and  selection  statistical  procedures  in  the  selec¬ 
tion  of  new  iterates.  A  rigorous  analysis  proves  new  convergence  theorems,  demonstrating 
that  a  subsequence  of  iterates  converges  with  probability  one  to  a  stationary  point  appropri¬ 
ately  defined  in  the  mixed- variable  domain.  Additionally,  advanced  algorithm  options  using 
modern  R&S  procedures,  surrogate  functions,  and  termination  criteria,  that  provide  com¬ 
putational  enhancements  to  the  basic  algorithm,  have  been  developed  and  implemented. 
Computational  tests  reveal  that  the  advanced  options  can  indeed  improve  performance, 
allowing  a  performance  comparison  to  popular  methods  from  the  stochastic  optimization 
literature.  The  original  contributions  of  this  dissertation  research  are  summarized  in  Sec¬ 
tion  6.1  and  future  research  directions  are  proposed  in  Section  6.2. 

6. 1  Contributions 

The  primary  contribution  of  this  research  is  that  it  develops  the  first  convergent  al¬ 
gorithm  for  numerically  solving  stochastic  optimization  problems  over  mixed-variable  do¬ 
mains.  Although  convergent  algorithms  that  apply  to  continuous-only  domains  {e.g.,  sto¬ 
chastic  approximation)  or  discrete-only  domains  {e.g.,  random  search)  have  been  devised, 
the  algorithms  presented  in  this  dissertation  bridge  the  gap  between  these  two  domain 
types,  illustrating  enhanced  generality  relative  to  existing  methods.  MGPS-RS  algorithms 
are  further  generalized  by  their  ability  to  readily  handle  problems  with  variable  bounds 
and/or  a  finite  number  of  linear  constraints  in  addition  to  unconstrained  problems. 

Although  the  convergence  theory  in  Section  3.7  builds  upon  existing  pattern  search 
theory  [1,13,79-81,143,146],  additional  mathematical  constructs  were  required  to  establish 
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new  convergence  results.  In  particular,  the  conditions  placed  on  R&S  parameters  ar  and 
5r  and  the  resulting  proofs  of  Lemmas  3.13  and  3.14  using  the  Borel-Cantelli  lemma  are 
original  developments.  The  remaining  lemmas  and  theorems  extend  existing  theory  by 
establishing  convergence  in  probabilistic  terms  appropriate  for  the  stochastic  setting. 

Another  important  contribution  of  this  research  is  the  development  of  a  viable  strat¬ 
egy  for  employing  surrogate  functions  to  augment  the  search.  Although  surrogate  search 
strategies  for  pattern  search  are  not  without  precedent  [30-32,39,86, 127, 145, 147],  this  re¬ 
search  is  the  first  to  introduce  kernel  regression  within  a  pattern  search  framework,  the  first 
to  apply  surrogate-assisted  pattern  search  algorithms  in  a  stochastic  setting,  and  the  first 
to  make  use  of  surrogates  in  solving  MVP  problems. 

A  third  contribution  is  the  development  of  effective  termination  criteria  by  combining 
the  traditional  step  length  thresholding  criterion  with  additional  rules  intended  to  avoid 
unnecessary  sampling.  This  strategy  expresses  the  practical  difference  between  two  candi¬ 
dates  at  termination,  reflected  in  the  indifference  zone  parameter,  in  terms  of  the  standard 
deviation  of  the  response  samples,  providing  a  heuristic  means  to  predict  when  sampling 
requirements  will  dramatically  increase.  At  the  same  time,  it  also  ensures  that  the  proba¬ 
bility  of  selecting  the  best  candidate  in  the  terminal  iteration  meets  a  minimum  threshold. 
Such  a  method  is  important  using  sampling  based  methods  to  provide  a  means  to  impose 
controls  on  potentially  excessive  sampling  requirements. 

A  final  contribution  of  this  research  is  the  computational  study.  The  literature  con¬ 
sists  of  a  limited  number  of  such  studies,  primarily  in  the  evaluation  of  direct  search  meth¬ 
ods  {e.g.,  [6,23,62]),  yet  these  studies  are  restricted  to  unconstrained  test  problems  and 
moderate  dimension,  typically  n  =  20  or  less.  In  the  study  presented  in  this  dissertation, 
substantial  effort  was  directed  toward  the  selection  of  a  wide  range  of  problem  types  and 
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algorithm  implementations.  The  mixture  of  objective  function  types,  constraint  types,  and 
range  of  problem  dimension  is  perhaps  the  most  extensive  collection  to  be  tested  in  a  sto¬ 
chastic  setting. 

6.2  Future  Research 

The  work  presented  in  this  dissertation  can  be  extended  in  many  directions.  Sugges¬ 
tions  for  future  research  can  be  generally  organized  into  two  categories:  modifications  to 
the  algorithmic  framework;  and  extensions  of  the  framework  to  a  broader  class  of  stochastic 
optimization  problems.  These  broad  categories  are  discussed  in  the  following  subsections. 

6.2.1  Modifications  to  Existing  Search  Framework 

Adaptive  Parameter  Updates.  It  was  briefly  mentioned  in  Sections  4.3  and  5.5.3 
that  adaptive  methods  for  updating  algorithm  parameters  Or  and  5r  may  have  the  potential 
to  improve  algorithm  performance.  In  particular,  if  the  parameters  are  decreased  too  ag¬ 
gressively,  then  the  algorithm  increases  the  precision  of  the  iterate  selection  decision  earlier 
in  the  iteration  sequence  than  if  the  decay  rate  is  slower.  This  can  adversely  impact  per¬ 
formance  because  the  sampling  requirements  may  increase  prematurely  while  the  search  is 
still  actively  moving  through  the  design  space,  potentially  slowing  progress  toward  optimal¬ 
ity.  It  would  be  beneficial  to  investigate  adaptive  parameter  updates  based  on  knowledge 
gained  during  the  search.  Candidate  strategies  might  include  monitoring  the  rate  of  decay 
of  the  step  length  parameter  or  the  ratio  of  successful  iterations  to  unsuccessful  itera¬ 
tions,  and  using  the  information  gained  to  adjust  the  decay  rate  of  and  Sr-  Additionally, 
in  this  research  these  parameters  are  represented  as  geometric  sequences,  but  alternative 
sequences  may  lead  to  better  performance  as  long  as  the  conditions  required  for  algorithm 
convergence  are  retained  (assumption  A6  in  Section  3.7). 
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Selecting  the  Number  of  Design  Sites.  In  the  computational  evaluation,  the 
number  of  design  sites  used  to  build  the  original  surrogate  function(s)  was  fixed,  regardless 
of  the  problem  dimension.  For  the  larger  problems,  it  is  possible  that  this  led  to  greater 
inaccuracies  between  the  surrogate  and  true  surfaces  and  the  subsequent  ineffective  SEARCH 
steps  observed  in  Section  5.5.1.  A  worthy  endeavor  would  be  to  design  a  rule  to  link  the 
number  of  original  design  sites  to  problem  dimension.  As  the  dimension  grows,  simple 
procedures  may  include  increasing  the  strength  of  the  LHS  design  and/or  the  number  of 
intervals  p  that  divide  each  dimension. 

Alternative  Kernel  Functions.  In  this  research,  Gaussian  kernels  were  used  to 
build  the  surrogate  functions,  but  alternative  mound-shaped  kernels  may  offer  advantages 
over  Gaussians.  An  example  is  the  Epanechnikov  kernel  with  parabolic  shape,  described  in 
[43]  and  [55,  pp.  25-28].  This  kernel  takes  a  value  of  zero  outside  a  fixed  interval,  which  can 
lead  to  numerical  benefits  over  a  Gaussian  kernel.  The  Gaussian  kernel  takes  on  very  small 
values  for  points  sufficiently  far  from  all  design  sites,  which  can  cause  numerical  underflow 
on  a  computer  [55,  p.  25].  There  may  be  additional  benefits  related  to  surrogate  accuracy 
that  can  be  realized  by  using  alternative  kernels. 

Alternative  Surrogate  Families.  Another  modification  to  the  surrogate-based  ap¬ 
proach  is  to  replace  kernel  regression  surfaces  with  an  alternative  family  of  surrogates.  In 
previous  studies  coupling  surrogates  with  pattern  search  [30-32,39,86, 127, 145, 147],  krig- 
ing  or  interpolating  splines  were  used  as  the  methods  to  approximate  the  response  surface. 
Many  other  methods,  such  as  traditional  polynomial  regression  via  least  squares  or  the  use 
of  artificial  neural  networks,  may  be  tried  that  might  lead  to  improvements  in  algorithm 
performance. 
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Improving  the  Efficiency  of  Searching  the  Surrogate.  Additional  efficiencies 
may  be  realized  by  modifying  how  the  search  of  the  surrogate  is  conducted.  In  the  algorithm 
implementation  described  in  Section  4.4.2,  search  of  the  surrogate  surface  is  conducted  via 
a  pattern  search  on  the  current  mesh.  However,  it  is  also  pointed  out  that  for  a  very  fine 
mesh  and  a  large  number  of  design  sites,  this  can  become  costly.  Potential  efficiencies  may 
be  gained  by  replacing  the  pattern  search  with  an  alternative  procedure,  perhaps  a  gradient 
or  quasi-Newton  search  by  deriving  an  analytical  expression  for  the  merit  function  gradient, 
and  then  mapping  the  resultant  point  to  the  nearest  mesh  point.  Another  approach  may  be 
to  reduce  the  number  of  points  that  must  be  evaluated  in  the  surrogate  function  by  using 
k-means  clustering  [41,  pp.  526-528]  to  group  the  design  sites  into  a  smaller  set  of  points. 
Using  this  approach,  the  design  sites  are  grouped  into  k  clusters  and  the  mean  of  each 
cluster  replaces  all  points  in  that  cluster  when  evaluating  the  surrogate  function.  Many 
iterative  procedures  exist  (see  [5])  to  determine  the  number  and  location  of  the  clusters 
necessary  to  satisfy  some  predetermined  criterion. 

Balancing  Sampling  and  Switching  Costs.  The  analysis  of  Section  5.5  revealed 
that,  although  the  SSM  procedure  appears  to  achieve  better  solutions  over  a  fixed  sampling 
budget,  it  can  require  a  large  number  of  switches  between  candidate  designs.  As  a  means 
to  balance  the  cost  of  sampling  with  the  cost  of  switching,  a  simple  modification  may 
be  to  implement  the  minimum  switching  sequential  procedure,  a  R&S  procedure  recently 
developed  by  Hong  and  Nelson  [59],  and  evaluate  its  performance  relative  to  the  R&S 
procedures  implemented  in  this  dissertation  research.  Hong  and  Nelson’s  procedure,  a  two- 
stage  sampling  procedure,  uses  the  same  number  of  switches  as  two-stage  procedures  when 
additional  samples  are  required  for  all  candidates  after  the  initial  stage,  but  still  maintains 
sequential  sampling. 
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Expanded  Computational  Testing.  The  computational  evaluation  of  Chapter  5 
provides  valuable  numerical  experience  and  insights  for  MGPS-RS  algorithms  and  their 
performance  relative  to  existing  methods.  However,  an  expanded  testing  program  could 
further  enhance  the  understanding  of  when  MGPS-RS  might  enjoy  success  and  where  ad¬ 
ditional  deficiencies  reside.  This  testing  should  be  broadened  to  more  problems  and  may 
consider  some  of  the  recommendations  presented  in  the  preceding  paragraphs.  Additional 
value  would  be  added  with  some  case  studies  that  applied  the  MGPS-RS  algorithms  to 
the  optimization  of  some  real-world  stochastic  systems  for  which  representative  simulation 
models  exist. 

6.2.2  Extensions  to  Broader  Problem  Classes 

Relaxing  the  Smoothness  Assumption.  A  restrictive  assumption  of  the  conver¬ 
gence  analysis  is  that  the  true  objective  function  is  continuously  differentiable  with  respect 
to  the  continuous  variables  when  the  discrete  variables  are  fixed.  Applying  the  Glarke  cal¬ 
culus  [35]  in  the  deterministic  setting,  Audet  and  Dennis  [14]  relax  this  assumption  and 
present  a  hierarchy  of  convergence  results  where  the  strength  of  the  results  depend  on  local 
smoothness  properties.  A  worthy  research  avenue  would  be  to  further  extend  the  results 
for  the  stochastic  setting  in  the  context  of  the  MGPS-RS  framework. 

Nonlinear  Constraints.  In  this  research,  the  constraints  are  restricted  to  bound 
and  linear  constraints  only.  A  worthwhile  extension  to  MGPS-RS  algorithms  would  be  to 
make  them  applicable  to  nonlinear  constraints  also,  perhaps  by  adapting  tools  from  any 
of  the  three  pattern  search  methods  applied  to  nonlinearly  constrained  deterministic  prob¬ 
lems  discussed  in  Section  2.2.1:  the  augmented  Lagrangian  approach  of  Lewis  and  Torczon 
[82],  the  filter  method  of  Audet  and  Dennis  [16],  and  the  Mesh  Adaptive  Direct  Search 
(MADS)  algorithm  of  Audet  and  Dennis  [15].  In  particular,  the  MADS  algorithm  extends 
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pattern  search  by  generating  mesh  directions  that  become  dense  in  the  limit.  However,  as 
these  directions  become  dense,  their  number  becomes  unbounded.  Since  this  is  prohibitive 
in  practice,  an  implementable  instance  is  provided  in  [15],  in  which  positive  spanning  di¬ 
rections  are  randomly  selected  at  each  iteration.  The  cost  of  doing  so,  however,  is  the 
weaker  condition  of  convergence  with  probability  1  to  a  stationary  point.  Extending  any 
of  the  deterministic  algorithms  [15, 16,  82]  to  the  stochastic  setting  results  in  the  weaker 
convergence  results,  but  nothing  is  lost  with  MADS,  since  its  convergence  result  is  already 
in  probabilistic  terms. 

Multiple  Responses.  In  this  research,  the  target  problem  class  contains  only  a  single 
system  response  output,  the  minimization  of  which  is  the  objective  of  the  optimization  task. 
This  problem  class  can  be  broadened  to  problems  that  have  multiple  response  outputs. 
Depending  on  the  objectives  of  the  optimization  problem,  the  additional  responses  can 
be  considered  in  two  different  ways:  (a)  as  additional  constraints,  or  (b)  as  additional 
objectives.  In  the  first  case,  the  additional  responses  may  be  constrained  to  a  specified 
performance  range.  Since  these  responses  may  not  be  linear,  any  of  the  techniques  suggested 
in  the  preceding  paragraph  for  handling  nonlinear  constraints  could  be  employed.  However, 
the  stochastic  nature  of  the  new  constraints  may  require  additional  assumptions  to  ensure 
sound  theoretical  convergence  results. 

In  the  second  case,  the  target  problem  becomes  one  with  multiple  objectives.  For  multi¬ 
objective  stochastic  optimization  problems,  specialized  statistical  procedures  are  needed  to 
select  iterates  and  retain  the  rigor  of  the  selection.  An  approach  for  simulation  optimization 
using  direct  search  is  suggested  in  [90],  employing  Hotelling’s  procedure.  This  approach 
consists  of  two  testing  phases  to  compare  the  incumbent  with  a  single  candidate,  a  first 
phase  consisting  of  an  all-pairwise  two-sample  comparison  of  means  of  all  responses  followed 
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by  a  second  phase  of  a  two-sample  comparison  of  means  on  a  weighted  sum  of  all  responses. 
After  the  first  phase,  if  at  least  one  response  mean  of  the  candidate  is  significantly  deficient 
or  all  response  means  are  statistically  insignificant,  the  candidate  is  rejected.  If  all  response 
means  are  significantly  improved,  then  the  candidate  is  accepted.  However,  if  at  least 
one  response  mean  is  significantly  improved  while  at  least  one  is  statistically  insignificant, 
then  the  second  phase  is  conducted  where  the  candidate  is  accepted  if  the  weighted  sum 
function  is  significantly  improved  and  rejected  otherwise.  This  approach  is  extended  in  [89] 
to  account  for  correlation  between  the  different  responses.  Employing  such  a  method  within 
the  pattern  search  framework  in  lieu  of  an  R&S  procedure  to  select  new  iterates  may  be  a 
worthwhile  research  area  for  multi-objective  stochastic  optimization.  Alternatively,  some 
limited  work  in  indifference-zone  R&S  procedures  has  been  done  to  extend  these  techniques 
to  multiple  responses  (see  [140,  pp.  139-140]  for  a  brief  review),  which  would  coincide  more 
closely  with  the  framework  presented  in  this  dissertation.  It  would  be  useful  to  explore 
iterative  use  of  these  methods  within  MGPS-RS  for  multi-objective  stochastic  optimization. 
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APPENDIX  A  -  Test  Problem  Details 


This  Appendix  provides  the  details  of  the  twenty-two  continuous-variable  test  prob¬ 
lems  used  in  the  computational  evaluation.  The  problem  numbers  are  as  assigned  in  the 
publications  from  which  they  were  selected  [58, 122],  Each  of  the  problems  is  classified  ac¬ 
cording  to  the  category  combination  defined  in  Section  5.3.  The  classification  scheme  has 
a  three  letter  designator:  the  first  letter  is  the  objective  function  type  (Q  -  quadratic,  S 
-  sum  of  squares,  P  -  generalized  polynomial,  G  -  general  nonlinear);  the  second  letter  is 
the  constraint  information  (U  -  unconstrained,  B  -  bounds  only,  L  -  linear  constraints  and 
bounds);  and  the  third  letter  designates  problem  size  (S  -  small,  M  -  medium,  L  -  large). 
Each  problem  listing  also  includes  the  number  of  variables,  the  number  of  bounds,  the 
number  of  linear  constraints,  the  objective  functional  form,  the  starting  point,  the  optimal 
solution,  and  the  form  of  the  constraints. 


Problem  3 

Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 
Objective  function: 

Bounds: 

Starting  point: 

Optimal  solution: 


QBS 

2 

1 

0 

fix)  =  X2  +  10~^ix2  -  Xi)'^ 

0  <  X2 

X  =  (10,1),  fix)  =  1.0081 
x*  =  i0,0),  fix*)  =  0 
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Problem  4 


Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 
Objective  function: 

Bounds: 

Starting  point: 

Optimal  solution: 


CBS 

2 

2 

0 

f{x)  =  i(xi  +  1)^  +  X2 

1  <  Xl,  0  <  X2 

X  =  (1.125,0.125),  /(x)  =  3.323568 
x-^  =  (l,0),  /(x-^)  =  | 


Problem  5 

Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 
Objective  function: 

Bounds: 

Starting  point: 

Optimal  solution: 


CBS 

2 

4 

0 

/(x)  =  sin(xi  +  X2)  +  (xi  -  X2)^  -  1.5x1  +  2.5x2  +  1 
—  1.5  <  xi  <  4,  —3  <  X2  <  3 

X  =  (0,0),  /(x)  =  1 

=  (-f  +  i-f  -  5)’ 
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Problem  25 


Category  combination:  SBS 

Number  of  variables:  3 

Number  of  bounds:  6 

Number  of  linear  constraints:  0 


Objective  function: 


Bounds: 

Starting  point: 
Optimal  solution: 


99 

fix)  = 

i=l 

where  fi{x)  =  -.Oli  +  exp 

and  Ui  =  25  +  (—50  In(.Olz))^/'^,  z  =  1, . . . ,  99 
0.1  <  XI  <  100,  0  <  X2  <  25.6,  0  <  X3  <  5 
X  =  (100, 12.5, 3),  /(x)  =  32.835 
x-^  =  (50,25,1.5),  f{x*)  =  0 


Problem  36 

Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 
Objective  function: 

Bounds: 

Starting  point: 

Optimal  solution: 

Constraint: 


PLS 

3 

6 

1 

/(x)  =  -X1X2X3 

0  <  xi  <  20,  0  <  X2  <  11,  0  <  X3  <  42 
X  =  (10,10,10),  /(x)  =  -1000 
x=^  =  (20, 11, 15),  f{x*)  =  -3300 
xi  +  2x2  +  2x3  <  72 
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Problem  105 


Category  combination:  GLS 

Number  of  variables:  8 

Number  of  bounds:  16 

Number  of  linear  constraints:  1 


Objective  function: 


Bounds: 


Starting  point: 


Optimal  solution: 


Constraint: 


235 

f{x)  =  ^In  {{ai{x)  +  bi(x)  +  Ci(x))/^/^) 
i=l 

where  for  i  =  1, . . . ,  235, 

~  ^3)^/2a:|), 

bi(x)  =  f^exp  (-(yi  -  X4fj2x^), 

Ciix)  =  exp  (-(yi  -  x^f/2xl), 

and  yi  is  as  defined  in  Table  that  follows 

0.001  <Xi<  0.499,  z  =  1,2, 

100  <  X3  <  180,  120  <  X4  <  210,  170  <  X5  <  240, 
5  <  Xj  <  25,  z  =  6, 7, 8 

X  =  (.1,  .2, 100, 125, 175, 11.2, 13.2, 15.8), 
f{x)  =  1297.6693 

=  (.4128928,  .4033526, 131.2613, 164.3135, 
217.4222, 12.28018, 15.77170, 20.74682), 
f{x*)  =  1138.416240 

+  X2  <  1 
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Additional  Data  (y*)  for  Problem  105. 


i 

Vi 

i 

Vi 

i 

Vi 

1 

95 

102-118 

150 

199-201 

200 

2 

105 

119-122 

155 

202-204 

205 

3-6 

no 

123-142 

160 

205-212 

210 

7-10 

115 

143-150 

165 

213 

215 

11-25 

120 

151-167 

170 

214-219 

220 

26-40 

125 

168-175 

175 

220-224 

230 

41-55 

130 

176-181 

180 

225 

235 

56-68 

135 

182-187 

185 

226-232 

240 

69-89 

140 

188-194 

190 

233 

245 

90-101 

145 

195-198 

195 

234-235 

250 

Problem  110 

Category  combination:  GBM 

Number  of  variables:  10 

Number  of  bounds:  20 

Number  of  linear  constraints:  0 


10  / 10  \  0-2 

Objective  function:  f{x)  =  [(ln(xi  -  2))2  +  (ln(10  -  Xi)f]  - 

i=l  \i=l  / 

Bounds:  2.001  <  Xi  <  9.999,  i  =  1, . . . ,  10 

Starting  point:  x  =  (9, . . . ,  9),  f{x)  =  —43.134337 

Optimal  solution:  x*  =  (9.35025655, . . . ,  9.35025655), 

f(x*)  =  -45.77846971 
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Problem  118 


Category  combination:  QLM 

Number  of  variables:  15 

Number  of  bounds:  30 

Number  of  linear  constraints:  29 


Objective  function: 


Bounds: 


Starting  point: 


Optimal  solution: 


Constraints: 


4 

f{x)  =  ^(2.3x3i+i  +  .0001x|j^i  +  1.7a;3j+2 

i=l 


+.00014+2  +  2.2xi,+3  +  .000154+3) 


8  <  <  21,  43  <  0:2  <  57,  3  <  xs  <  16, 

0  <  X3i+i  <  90,  0  <  X3i+2  <  120,  0  <  X3i+3  <  60, 
i  =  1, 2,3,4 

x  =  (20, 55, 15, 20, 60, 20, 20, 60, 20, 20, 60, 20, 20, 60, 20), 
fix)  =  769.8400 

=  (8, 49, 3, 1, 56, 0, 1, 63, 6, 3, 70, 12,  5,  77, 18) 
fix*)  =  556.2726 

I  <  Ax  <  u,  where 


I  =  [-7, -7, -7, -7, -7, -7, -7, -7, -7, -7, -7, -7, 60, 50, 70, 85, 100]^, 
u  =  [6, 6, 6, 6, 7, 7, 7, 7, 6, 6, 6, 6, 00, 00, 00, 00, 00]^) 
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■  -1  00100000000000' 
000  -1  00100000000 
000000  -1  00100000 
000000000  -1  00100 
0  -1  0010000000000 
0000  -1  0010000000 
0000000  -1  0010000 
0000000000  -1  0010 
A=  00  -1  001000000000 

00000  -1  001000000 
00000000  -1  001000 
00000000000  -1  001 
111000000000000 
000111000000000 
000000111000000 
000000000111000 
000000000000111 


Problem  224 

Category  combination:  QLS 

Number  of  variables:  2 

Number  of  bounds:  4 

Number  of  linear  constraints:  4 

Objective  function:  f(x)  =  2xf  +  —  48x1  —  40x2 

Bounds:  0  <  xi  <  6,  0  <  X2  <  6 

Starting  point:  x  =  (0.1,  0.1),  /(x)  =  —8.77 

Optimal  solution:  x*  =  (4,4),  f(x*}  =  —304 

Constraints:  I  <  Ax  <  u,  where 

I  =  Q  ,  A=  j  J  ,  and  u  = 
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Problem  244 


Category  combination:  SUS 

Number  of  variables:  3 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

10 

Objective  function:  f{x)  =  ^^[exp(— xiZj)  —  X3exp{—X2Zi)  — 

i=l 

where  yi{x)  =  exp  {—Zi)  —  5exp(— lOzj) 
and  Zi  =  O.li,  i  =  1, . . . ,  10 

Starting  point:  x  =  (1,  2, 1),  f{x)  =  1.59884 

Optimal  solution:  x*  =  (1, 10,  5),  f{x*)  =  0 

Problem  256 

Category  combination:  PUS 

Number  of  variables:  4 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

Objective  function:  f{x)  =  (xi  +  10x2)^  +  5(x3  —  x^Y  +  {^2  —  2x3)^ 

+10(xi  —  X4)^  (Powell  function) 
Starting  point:  x  =  (3,  —1,0, 1),  /(x)  =  215 

Optimal  solution:  x*  =  (0, 0, 0,  0),  f{x*)  =  0 
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Problem  275 


Category  combination:  QUS 

Number  of  variables:  4 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 
Objective  function:  f{x)  =  x^Qx 

where  Q{i,j)  =  i^j_i  (4x4  Hilbert  Matrix) 
Starting  point:  x  =  (—4,  —2,  —1.333,  —1),  f{x)  =  33.9651 

Optimal  solution:  x*  =  (0, 0, 0,  0),  f{x*)  =  0 


Problem  281 

Category  combination:  GUM 

Number  of  variables:  10 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 


Objective  function: 

Starting  point: 
Optimal  solution: 


10  \V3 

-  1)2  j 

i=l  J 

X  =  (0,...,0),  f{x)  =  14.4624 

=  (!,...,!),  f{x*)  =  ^ 
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Problem  287 


Category  combination:  PUM 

Number  of  variables:  20 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

5 

Objective  function:  f{x)  -  ^  [l00(xf  -  +  {xi  -  1)2 

i=l 

+90(x2^io  “  +  (a^i+10  -  1)^ 

+  10.1((Xi+5  -  1)2  +  (xi+15  -  1)2) 
+19.8(xi+5  -  I)(a;i+15  -  1)] 

Starting  point:  Xi  =  —3,  i  =  1, . . . ,  5, 11, . . . ,  15, 

Xi  =  —1,  i  =  6, . . . ,  10, 16, . . . ,  20 

f{x)  =  95960 

Optimal  solution:  x*  =  (1, . . . ,  1),  f{x*)  =  0 

Problem  288 

Category  combination:  SUM 

Number  of  variables:  20 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

5 

Objective  function:  /(x)  =  ^^(xj  +  10xj+5)2  +  5(xi+io  —  Xj+i5)2 

i=l 

2Xj+lo)  d"  10(Xj  Xj-i-is) 

Starting  point:  Xj  =  3,  z  =  1, . . . ,  5,  x*  =  —1,  z  =  6, . . . ,  10, 

Xj  =  0,  z  =  11, . . . ,  15,  Xj  =  1,  z  =  16, . . . ,  20, 
fix)  =  1075 

Optimal  solution:  x*  =  (0, . . . ,  0),  f{x*)  =  0 
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Problem  289 


Category  combination:  GUL 

Number  of  variables:  30 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

30 

i=l 

Starting  point:  x  =  (-1.03, 1.07,  -1.10, 1.13,  -1.17, 1.20,  -1.23, 1.27, 

-1.30, 1.33,  -1.37, 1.40,  -1.43, 1.47,  -1.50, 1.53, 
-1.57, 1.60,  -1.63, 1.67,  -1.70, 1.73,  -1.77, 1.80, 
-1.83, 1.87,  -1.90, 1.93,  -1.97, 2.00), 
f{x)  =  0.696313 

Optimal  solution:  x*  =  (0, . . . ,  0),  f{x*)  =  0 

Problem  297 

Category  combination:  SUL 

Number  of  variables:  30 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

29 

Objective  function:  f{x)  =  E  [l00(xi+i  -  xf +  (1  -  Xif] 

i=l 

(Rosenbrock  banana  function) 

Starting  point:  x  =  (— 1.2, 1, . . . ,  — 1.2, 1),  /(x)  =  7139 

Optimal  solution:  x*  =  (1, . . . ,  1),  /(x*)  =  0 


Objective  function:  /(x)  =  1  —  exp 
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Problem  300 


Category  combination:  QUM 

Number  of  variables:  20 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

Objective  function:  f{x)  =  x^Qx  —  2x1 


where  Q  = 


1  -1  0  ...  O' 

-12-10  : 

0-1  2-1  0 


0-1  2-1 
0  ...  0-1  2 


Starting  point:  x  =  (0, . . . ,  0),  /(x)  =  0 

Optimal  solution:  x*  =  (20, 19, 18, . . . ,  2, 1),  /(x*)  =  —20 


Problem  301 

Category  combination:  QUL 

Number  of  variables:  50 

Number  of  bounds:  0 

Number  of  linear  constraints:  0 

Objective  function:  /(x)  =  x'^Qx  —  2xi 


where  Q  = 


1  -1  0  ...  O' 

-12-10  : 

0-1  2-1  0 


0-1  2-1 
0  ...  0-1  2 


Starting  point:  x  =  (0, . . . ,  0),  /(x)  =  0 

Optimal  solution:  x*  =  (50, 49, 48, . . . ,  2, 1),  f{x*)  =  —50 
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Problem  305 


Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 

Objective  function: 

Starting  point: 

Optimal  solution: 

Problem  314 

Category  combination: 
Number  of  variables: 

Number  of  bounds: 

Number  of  linear  constraints: 
Objective  function: 

Starting  point: 

Optimal  solution: 


PUL 

100 

0 

0 

100  /lOO  \2  /loO 

i=l  \i=l  /  Vi=l 

x  =  (0.1, . . . ,  0.1),  f{x)  =  4064923200 

X*  =  f{x*)  =  0 

GUS 

2 

0 

0 

f{x)  =  (xi  -  2)2  +  {X2  -  1)^  +  ^  + 
where  g{x)  =  —^xf  —  +  1, 

and  h(x)  =  xi  —  2x2  +  1 
x  =  (2,2),  /(x)  =  5.99 

=  (1.789, 1.374),  f(x*)  =  0.169040 


(h(x)y 
0.2  ’ 
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Problem  392 


Category  combination:  QLL 

Number  of  variables:  30 

Number  of  bounds:  45 

Number  of  linear  constraints:  30 

5  3 

Objective  function:  f{x)  =  ^  ^ 

i=i  j=i  ^ 

{kiji  “h  kpji)xi2~\-3i-\-j 

{ksji  H"  kpiji)[xi2~\-3i+j  3) 

-kL2ji  '^^^^(.^12+j+3e  -  Xj-3+3e) 

where  T'2jii  kj\ji,  kiji,  kpji,  /csji,  kpijp  and  kp2ji 
are  as  defined  in  Table  that  follows 

Bounds:  0  <  xi  <  100,  0  <  X2  <  280,  0  <  X3  <  520, 

0  <  X4  <  180,  0  <  X5  <  400,  0  <  X6  <  400, 

0  <  X7  <  220,  0  <  X8  <  450,  0  <  xg  <  500, 

0  <  xio  <  150,  0  <  xii  <  450,  0  <  X12  <  630, 

0  <  X13  <  100,  0  <  xi4  <  400,  0  <  X15  <  600, 

>  0,  z  =  16, . . . ,  30 

Starting  point:  x  =  (80, 50,  370, 100, 150,  200, 100,  250, 400,  50, 

200, 500, 50, 200, 500, 100, 120, 410, 120, 190, 

190, 60, 240, 370, 130, 100, 510, 30, 250, 510),3 
fix)  =  -845999 

Optimal  solution:  x^^  =  (99.99, 142.22,  519.88, 136.74, 103.47,  399.99, 

191.70, 1.56, 500, 143.43, 82.39, 629.82, 

99.92, 125.22, 600, 101.85, 142.25, 519.88, 

144.58, 105.73, 409.59, 182.01, 29.34, 490.52, 
143.43, 52.43, 629.70, 99.92, 125.12, 600), 
fix*)  =  -1693551.668 

Constraints:  I  <  Ax  <  u,  where 

k  =  —00,  i  =  1, . . . ,  15,  /j  =  0,  z  =  16, . . . ,  30, 

Ui  =  170,  z  =  1, 2, 4, 5,  7, 8, 10, 11, 13, 14, 

Ui  =  180,  z  =  3, 6, 8, 12, 15,  Ui  =  00,  z  =  16, . . . ,  30, 

^Starting  point  was  modified  from  published  version  to  make  it  feasible. 
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^  =  [  ^2  ]  , 

'000000000000000' 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
000000000000000 
"^^“-1  00000000000000’ 
0  -1  0000000000000 

00  -1  000000000000 

-1  00  -1  00000000000 

0  -1  00  -1  0000000000 

00  -1  00  -1  000000000 

-1  00  -1  00  -1  00000000 

0  -1  00  -1  00  -1  0000000 

00  -1  00  -1  00  -1  000000 

-1  00  -1  00  -1  00  -1  00000 
0  -1  00  -1  00  -1  00  -1  0000 

00  -1  00  -1  00  -1  00  -1  000 

-1  0  0  -1  0  0  -1  0  0  -1  0  0  -1  0  0 

0  -1  0  0  -1  0  0  -1  0  0  -1  0  0  -1  0 

0  0  -1  0  0  -1  0  0  -1  0  0  -1  0  0  -1 
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and 
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Additional  Data  for  Problem  392. 


j  =  1 

i  =  2 

J  =  3 
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APPENDIX  B  -  Test  Result  Data 


Some  additional  details  from  the  test  results  are  presented  in  this  appendix.  Section 
B.l  presents  a  series  of  charts  that  shows  the  iteration  history  of  the  algorithms  for  all 
test  problems.  Section  B.2  provides  the  data  to  support  the  AN OVA  and  nonparametric 
procedures  of  Section  5.5.1. 

B.l  Iteration  History  Charts 

To  give  a  visual  perspective  of  algorithm  progression,  a  series  of  graphs  are  displayed 
on  the  pages  that  follow  that  plot  performance  measure  Q,  averaged  over  30  replications, 
versus  the  number  of  response  samples  obtained  for  each  algorithm,  noise  case,  and  test 
problem.  The  graphs  are  shown  on  log  scales  so  that  the  progression  in  the  latter  stages  of 
the  search  can  be  seen  more  easily.  In  the  graph  legends,  the  names  of  the  algorithms  are 
referred  to  as  follows: 

•  RIN  -  MGPS  with  Rinott’s  procedure  and  no  surrogates, 

•  SSM  -  MGPS  with  Sequential  Selection  with  Memory  procedure  and  no 
surrogates, 

•  SAS  -  MGPS  with  Screen-and-Select  procedure  and  no  surrogates, 

•  S-RIN  -  Surrogate  assisted  MGPS  with  Rinott’s  procedure, 

•  S-SSM  -  Surrogate  assisted  MGPS  with  Sequential  Selection  with  Memory 
procedure, 

•  S-SAS  -  Surrogate  assisted  MGPS  with  Screen-and-Select  procedure, 

•  FDSA  -  Finite- Difference  Stochastic  Approximation, 

•  SPSA  -  Simultaneous  Perturbation  Stochastic  Approximation, 

•  RNDS  -  Random  Search,  and 

•  NM  -  Nelder-Mead  simplex  search. 

For  the  continuous-variable  problems,  each  of  the  MGPS-RS  variants  are  plotted  on  the 
left  of  the  page  and  on  the  right,  all  remaining  algorithm  implementations  are  plotted  with 
S-SSM  for  a  visual  comparison  with  a  MGPS-RS  variant. 
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Test  Problem  5 

Noise  Case  1 


Noise  Case  2 
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Test  Problem  1 
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Test  Problem  314 
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B.2  Statistical  Analysis  Data  Summary 

Data  necessary  to  support  the  ANOVA  of  Section  5.5.1  is  provided  in  Table  B.l  for  each 
of  the  performance  measures  Q  and  P.  The  transformation  function  used  for  experiment 
outcome  Qijki  and  Pijke^  for  1  <  z  <  3  (index  on  R&S  procedure),  1  <  J  <  2  (index  on  the 
use  of  surrogates),  1  <  k  <  2  (index  on  noise  case),  and  1  <  ^  <  30  (index  on  replication) 
is  shown  in  the  T{Qijke)  and  T{Pijke)  column,  respectively.  Also  shown  is  the  test  statistic 
W  for  the  Shapiro-Wilk  test  for  nonnormality  of  the  studentized  residuals  and  the  p- value 
associated  with  that  test.  The  closer  W  is  to  unity,  the  more  likely  the  data  will  be  accepted 
as  normal.  The  null  hypothesis  is  that  the  data  are  normal,  so  a  p- value  greater  than  .05 
would  indicate  that  the  null  hypothesis  is  not  rejected  at  a  .05  significance  level.  On  the 
pages  following  Table  B.3  a  series  of  charts  is  shown  for  each  test  problem  that  plots  the 
studentized  residuals  versus  their  predicted  values  as  a  visual  test  of  the  constant  variance 
assumption,  and  the  normal  probability  plot  of  the  studentized  residuals  as  a  visual  test  of 
the  normality  assumption. 

The  results  of  the  nonparametric  tests  are  presented  in  Tables  B.2  and  B.3  for  per¬ 
formance  measures  Q  and  P,  respectively.  In  the  tables,  p-values  are  displayed  that  test 
for  differences  in  the  distributions  as  a  result  of  the  R&S  procedure  (RS),  using  surrogates 
(SRCH),  and  noise.  The  column  headings  are  defined  as  follows: 

•  WIL  -  Wilcoxon  rank-sum  procedure, 

•  KW  -  Kruskal- Wallis  procedure, 

•  MED  -  two-sample  median  procedure, 

•  BM  -  Brown-Mood  procedure,  and 

•  VDW  -  van  der  Waerden  procedure. 

Both  WIL  and  KW  test  the  null  hypothesis  that  the  independent  samples  represent 
populations  with  the  same  median  value.  The  MED,  BM,  and  VDW  procedures  test 
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the  null  hypothesis  that  the  independent  samples  derive  from  the  same  distribution.  In 
all  cases,  a  p-value  greater  than  0.05  indicates  that  the  null  hypothesis  is  not  rejected 
at  the  .05  significance  level.  P-values  that  are  enclosed  by  a  rectangle  signify  tests  that 
disagree  with  the  results  of  the  AN OVA  and  multiple  comparison  procedures  of  Section 
5.5.1.  For  example,  in  test  problem  36,  the  ANOVA  procedure  specified  the  effect  of  the 
R&S  procedure  to  be  significant  at  the  .05  level  for  performance  measure  Q,  but  each  of 
the  three  nonparametric  tests  fail  to  reject  their  null  hypotheses  at  this  significance  level. 
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Table  B.l.  Transformation  functions  and  Shapiro-Wilk  Nonnormality  Test  Re¬ 
sults. 


Test 

Problem 

Performance  Measure  Q 
T{Qiju)  W  p-value 

Performance  Measure  P 
T{Pijki)  W  p-value 

3 

log((3) 

.993 

.199 

Pi 

.858 

.000 

4 

log(Q  +  1) 

.806 

.000 

(P+l)-i 

.854 

.000 

5 

log((3) 

.990 

.015 

log(P) 

.983 

.000 

25 

.986 

.002 

P 

.982 

.000 

36 

^og(Q) 

.963 

.000 

log(P) 

.968 

.000 

105 

Q 

.925 

.000 

p 

.903 

.000 

no 

Q 

.990 

.015 

p 

.993 

.078 

118 

iog((3) 

.961 

.000 

log(P) 

.946 

.000 

224 

iog((3) 

.980 

.000 

Pi 

.992 

.059 

244 

iog((3) 

.967 

.000 

p 

.990 

.016 

256 

iog((5) 

.977 

.000 

Pi 

.988 

.004 

275 

iog((5) 

.966 

.000 

P2 

.998 

.879 

281 

(Q  +  1)  ^ 

.951 

.000 

P2 

.964 

.000 

287 

Q-" 

.862 

.000 

p-2 

.901 

.000 

288 

iog((3) 

.980 

.000 

log(P) 

.982 

.000 

289 

q" 

.886 

.000 

P-1 

.895 

.000 

297 

iog((3) 

.982 

.000 

log(P) 

.904 

.000 

300 

iog((3) 

.931 

.000 

log(P) 

.825 

.000 

301 

iog((3) 

.965 

.000 

p 

.957 

.000 

305 

iog((3) 

.788 

.000 

p-1 

.772 

.000 

314 

iog((5) 

.939 

.000 

P2 

.990 

.017 

392 

Q 

.889 

.000 

p 

.931 

.000 

MVPl 

iog((5) 

.994 

.174 

log(P) 

.981 

.000 

MVP2 

iog((5) 

.996 

.437 

log(P) 

.960 

.000 

MVP3 

iog((5) 

.904 

.000 

log(P) 

.937 

.000 

MVP4 

iog((3) 

.944 

.000 

log(P) 

.925 

.000 
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Table  B.2.  P-values  for  Nonparametric  Tests  —  Performance  Measure  Q. 

Test  I  R5  I  SRCH  I  Noise 


Problem  KW  BM  VDW  WIL  MED  VDW  WIL  MED  VDW 
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Table  B.3.  P-values  for  Nonparametric  Tests  —  Performance  Measure  P. 


Test 

Problem 

KW 

RS 

BM 

VDW 

WIL 

SRCH 

MED 

VDW 

WIL 

Noise 

MED 

VDW 

3 

.470 

.292 

.424 

.000 

.000 

.000 

.760 

.674 

.601 

4 

.055 

.792 

.032 

.000 

.000 

.000 

.039 

.400 

.016 

5 

.734 

.876 

.625 

.064 

.207 

.120 

.000 

.012 

.000 

25 

.653 

.967 

.485 

.000 

.000 

.000 

.000 

.000 

.037 

36 

.491 

.792 

.485 

.086 

1.00 

.032 

.024 

.528 

.017 

105 

.507 

.088 

.420 

.000 

.000 

.000 

.000 

.000 

.000 

no 

.318 

.532 

.281 

.932 

.293 

.798 

.000 

.000 

.000 

118 

.838 

.532 

.922 

.000 

.000 

.000 

.000 

.000 

.000 

224 

.128 

.274 

.160 

.730 

.833 

.508 

.000 

.000 

.000 

244 

.867 

.875 

.815 

.000 

.000 

.000 

.080 

.207 

.039 

256 

.076 

.108 

.134 

.210 

.400 

.063 

.000 

.000 

.000 

275 

.817 

.741 

.886 

.499 

.528 

.426 

.841 

.528 

.786 

281 

.034 

.292 

.007 

.817 

.833 

.859 

.000 

.000 

.000 

287 

.062 

.080 

.040 

.000 

.000 

.000 

.000 

.674 

.000 

288 

.000 

.088 

.000 

.105 

.027 

.436 

.000 

.000 

.000 

289 

.410 

.967 

.247 

.000 

.000 

.000 

.706 

.833 

.763 

297 

.000 

.000 

.000 

.346 

.599 

.391 

.000 

.000 

.000 

300 

.586 

.741 

.723 

.000 

.000 

.000 

.000 

.006 

.000 

301 

.004 

.357 

.000 

.491 

.528 

.587 

.000 

.000 

.000 

305 

.000 

.024 

.000 

.314 

.141 

.477 

.000 

.000 

.000 

314 

.087 

.357 

.112 

.030 

.141 

.042 

.000 

.000 

.000 

392 

.052 

.039 

.045 

.000 

.000 

.000 

.000 

.000 

.000 

MVPl 

.067 

.240 

.041 

.317 

.141 

.436 

.000 

.001 

.000 

MVP2 

.000 

.000 

.000 

.589 

.204 

.737 

.000 

.003 

.000 

MVP3 

.001 

.028 

.000 

.339 

.833 

.107 

.000 

.000 

.000 

MVP4 

.078 

.967 

.019 

.682 

.400 

.328 

.000 

.000 

.000 
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Test  Problem  3 


Performance  Measure  Q 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  4 


Performance  Measure  Q 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  5 


Performanee  Measure  Q 


Normal  Probability  Plot 


Performanee  Measure  P 

Residual  vs.  Predieted  Plot  Normal  Probability  Plot 
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Test  Problem  25 


Performance  Measure  Q 


Residual  vs.  Predicted  Plot 


Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  36 


Performance  Measure  Q 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  105 


Performance  Measure  Q 


Residual  vs.  Predicted  Plot 


Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  110 


Performance  Measure  Q 


Residual  vs.  Predicted  Plot 
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Test  Problem  118 


Performance  Measure  Q 


Normal  Probability  Plot 
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Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  224 
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Residual  vs.  Predicted  Plot 
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Test  Problem  244 


Performance  Measure  Q 


Residual  vs.  Predicted  Plot 
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Normal  Probability  Plot 
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Test  Problem  256 


Performance  Measure  Q 
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Test  Problem  275 


Performance  Measure  Q 


Normal  Probability  Plot 
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Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  281 


Performance  Measure  Q 


Normal  Probability  Plot 


Normal  Quantile  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  287 


Performance  Measure  Q 


Residual  vs.  Predicted  Plot 


Normal  Probability  Plot 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  288 


Performance  Measure  Q 


Performance  Measure  P 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  289 


Performance  Measure  Q 
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Test  Problem  297 
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Test  Problem  300 


Performance  Measure  Q 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  301 


Performance  Measure  Q 

Residual  vs.  Predicted  Plot  Normal  Probability  Plot 
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Test  Problem  305 
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Test  Problem  314 
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Test  Problem  392 
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Test  Problem  MVPl 
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Test  Problem  MVP2 
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Test  Problem  MVP3 


Performance  Measure  Q 
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Test  Problem  MVP4 


Performance  Measure  Q 
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